All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
@ 2016-02-03 10:47 Jiri Pirko
  2016-02-03 10:47 ` [patch net-next RFC 1/6] Introduce devlink infrastructure Jiri Pirko
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Jiri Pirko @ 2016-02-03 10:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

From: Jiri Pirko <jiri@mellanox.com>

There a is need for some userspace API that would allow to expose things
that are not directly related to any device class like net_device of
ib_device, but rather chip-wide/switch-ASIC-wide stuff.

Use cases:
1) get/set of port type (Ethernet/InfiniBand)
2) monitoring of hardware messages to and from chip
3) setting up port splitters - split port into multiple ones and squash again,
   enables usage of splitter cable
4) setting up shared buffers - shared among multiple ports within one chip

First patch of this set introduces a new generic Netlink based interface,
called "devlink". It is similar to nl80211 model and it is heavily
influenced by it, including the API definition. The devlink introduction patch
implements use cases 1) and 2). Other 2 are in development atm and will
be addressed by follow-ups.

It is very convenient for drivers to use devlink, as you can see in other
patches in this set.

Counterpart for devlink is userspace tool called "dl". Command line interface
and outputs are derived from "ip" tool so it should be easy for users to get
used to it.

It is available here:
https://github.com/jpirko/devlink

Example usage:
butter:~$ dl help
Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
where  OBJECT := { dev | port | monitor }
       OPTIONS := { -v/--verbose }

butter:~$ dl dev show
0: devlink0: bus pci dev 0000:01:00.0

butter:~$ dl port help
Usage: dl port show [DEV/PORT_INDEX]
Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]

butter:~$ dl port show
devlink0/1: type ib ibdev mlx4_0
devlink0/2: type ib ibdev mlx4_0

butter:~$ sudo dl port set devlink0/1 type eth

butter:~$ dl port show
devlink0/1: type eth netdev ens4
devlink0/2: type ib ibdev mlx4_0

butter:~$ sudo dl port set devlink0/1 type auto

butter:~$ dl port show
devlink0/1: type eth(auto) netdev ens4
devlink0/2: type ib ibdev mlx4_0

Jiri Pirko (6):
  Introduce devlink infrastructure
  mlxsw: Implement devlink interface
  mlxsw: Implement hardware messages notification using devlink
  mlx4: Implement devlink interface
  mlx4: Implement hardware messages notification using devlink
  mlx4: Implement port type setting via devlink interface

 MAINTAINERS                                    |   8 +
 drivers/infiniband/hw/mlx4/main.c              |   7 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   8 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c        |   9 +
 drivers/net/ethernet/mellanox/mlx4/intf.c      |   9 +
 drivers/net/ethernet/mellanox/mlx4/main.c      | 129 +++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h      |   2 +
 drivers/net/ethernet/mellanox/mlxsw/core.c     |  39 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  20 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   2 +
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c |  20 +
 include/linux/mlx4/driver.h                    |   3 +
 include/net/devlink.h                          | 152 +++++
 include/uapi/linux/devlink.h                   |  84 +++
 net/Kconfig                                    |   7 +
 net/core/Makefile                              |   1 +
 net/core/devlink.c                             | 856 +++++++++++++++++++++++++
 17 files changed, 1313 insertions(+), 43 deletions(-)
 create mode 100644 include/net/devlink.h
 create mode 100644 include/uapi/linux/devlink.h
 create mode 100644 net/core/devlink.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch net-next RFC 1/6] Introduce devlink infrastructure
  2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
@ 2016-02-03 10:47 ` Jiri Pirko
  2016-02-11 14:31   ` Ivan Vecera
  2016-02-03 10:47 ` [patch net-next RFC 2/6] mlxsw: Implement devlink interface Jiri Pirko
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2016-02-03 10:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

From: Jiri Pirko <jiri@mellanox.com>

Introduce devlink infrastructure for drivers to register and expose to
userspace via generic Netlink interface.

There are two basic objects defined:
devlink - one instance for every "parent device", for example switch ASIC
devlink port - one instance for every physical port of the device.

This initial portion implements basic get/dump of objects to userspace.
Also, hardware message monitoring and port type setting is implemented.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 MAINTAINERS                  |   8 +
 include/net/devlink.h        | 152 ++++++++
 include/uapi/linux/devlink.h |  83 +++++
 net/Kconfig                  |   7 +
 net/core/Makefile            |   1 +
 net/core/devlink.c           | 856 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1107 insertions(+)
 create mode 100644 include/net/devlink.h
 create mode 100644 include/uapi/linux/devlink.h
 create mode 100644 net/core/devlink.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f678c37..c4efa8d7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3501,6 +3501,14 @@ F:	include/linux/device-mapper.h
 F:	include/linux/dm-*.h
 F:	include/uapi/linux/dm-*.h
 
+DEVLINK
+M:	Jiri Pirko <jiri@mellanox.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	net/core/devlink.c
+F:	include/net/devlink.h
+F:	include/uapi/linux/devlink.h
+
 DIALOG SEMICONDUCTOR DRIVERS
 M:	Support Opensource <support.opensource@diasemi.com>
 W:	http://www.dialog-semiconductor.com/products
diff --git a/include/net/devlink.h b/include/net/devlink.h
new file mode 100644
index 0000000..a7888e2
--- /dev/null
+++ b/include/net/devlink.h
@@ -0,0 +1,152 @@
+/*
+ * include/net/devlink.h - Network physical device Netlink interface
+ * Copyright (c) 2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _NET_DEVLINK_H_
+#define _NET_DEVLINK_H_
+
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <net/net_namespace.h>
+#include <uapi/linux/devlink.h>
+
+struct devlink_ops;
+
+struct devlink {
+	struct list_head list;
+	struct list_head port_list;
+	int index;
+	const struct devlink_ops *ops;
+	struct device dev;
+	possible_net_t _net;
+	char priv[0] __aligned(NETDEV_ALIGN);
+};
+
+struct devlink_port {
+	struct list_head list;
+	struct devlink *devlink;
+	unsigned index;
+	enum devlink_port_type type;
+	enum devlink_port_type desired_type;
+	void *type_dev;
+};
+
+struct devlink_ops {
+	size_t priv_size;
+	int (*port_type_set)(struct devlink_port *devlink_port,
+			     enum devlink_port_type port_type);
+};
+
+static inline void *devlink_priv(struct devlink *devlink)
+{
+	BUG_ON(!devlink);
+	return &devlink->priv;
+}
+
+static inline struct devlink *priv_to_devlink(void *priv)
+{
+	BUG_ON(!priv);
+	return container_of(priv, struct devlink, priv);
+}
+
+static inline struct device *devlink_dev(struct devlink *devlink)
+{
+	return &devlink->dev;
+}
+
+static inline void set_devlink_dev(struct devlink *devlink, struct device *dev)
+{
+	devlink->dev.parent = dev;
+}
+
+static inline const char *devlink_name(const struct devlink *devlink)
+{
+	return dev_name(&devlink->dev);
+}
+
+struct ib_device;
+
+#if IS_ENABLED(CONFIG_NET_DEVLINK)
+
+struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
+int devlink_register(struct devlink *devlink);
+void devlink_unregister(struct devlink *devlink);
+void devlink_free(struct devlink *devlink);
+void devlink_hwmsg_notify(struct devlink *devlink,
+			  const char *buf, size_t buf_len,
+			  enum devlink_hwmsg_type type,
+			  enum devlink_hwmsg_dir dir,
+			  gfp_t gfp_mask);
+int devlink_port_register(struct devlink *devlink,
+			  struct devlink_port *devlink_port,
+			  unsigned int port_index);
+void devlink_port_unregister(struct devlink_port *devlink_port);
+void devlink_port_type_eth_set(struct devlink_port *devlink_port,
+			       struct net_device *netdev);
+void devlink_port_type_ib_set(struct devlink_port *devlink_port,
+			      struct ib_device *ibdev);
+void devlink_port_type_clear(struct devlink_port *devlink_port);
+
+#else
+
+static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
+					    size_t priv_size)
+{
+	return kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL);
+}
+
+static inline int devlink_register(struct devlink *devlink)
+{
+	return 0;
+}
+
+static inline void devlink_unregister(struct devlink *devlink)
+{
+}
+
+static inline void devlink_free(struct devlink *devlink)
+{
+	kfree(devlink);
+}
+
+static inline void devlink_hwmsg_notify(struct devlink *devlink,
+					const char *buf, size_t buf_len,
+					enum devlink_hwmsg_type type,
+					enum devlink_hwmsg_dir dir,
+					gfp_t gfp_mask)
+{
+}
+
+static inline int devlink_port_register(struct devlink *devlink,
+					struct devlink_port *devlink_port,
+					unsigned int port_index)
+{
+	return 0;
+}
+
+static inline void devlink_port_type_eth_set(struct devlink_port *devlink_port,
+					     struct net_device *netdev)
+{
+}
+
+static inline void devlink_port_type_ib_set(struct devlink_port *devlink_port,
+					    struct ib_device *ibdev)
+{
+}
+
+static inline void devlink_port_type_clear(struct devlink_port *devlink_port)
+{
+}
+
+#endif
+
+#endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
new file mode 100644
index 0000000..1d9f999
--- /dev/null
+++ b/include/uapi/linux/devlink.h
@@ -0,0 +1,83 @@
+/*
+ * include/uapi/linux/devlink.h - Network physical device Netlink interface
+ * Copyright (c) 2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_DEVLINK_H_
+#define _UAPI_LINUX_DEVLINK_H_
+
+#define DEVLINK_GENL_NAME "devlink"
+#define DEVLINK_GENL_VERSION 0x1
+#define DEVLINK_GENL_MCGRP_CONFIG_NAME "config"
+#define DEVLINK_GENL_MCGRP_HWMSG_NAME "hwmsg"
+
+enum devlink_command {
+	/* don't change the order or add anything between, this is ABI! */
+	DEVLINK_CMD_UNSPEC,
+
+	DEVLINK_CMD_GET,		/* can dump */
+	DEVLINK_CMD_SET,
+	DEVLINK_CMD_NEW,
+	DEVLINK_CMD_DEL,
+
+	DEVLINK_CMD_HWMSG_NEW,
+
+	DEVLINK_CMD_PORT_GET,		/* can dump */
+	DEVLINK_CMD_PORT_SET,
+	DEVLINK_CMD_PORT_NEW,
+	DEVLINK_CMD_PORT_DEL,
+
+	/* add new commands above here */
+
+	__DEVLINK_CMD_MAX,
+	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
+};
+
+enum devlink_hwmsg_type {
+	DEVLINK_HWMSG_TYPE_TMP, /* temporary, until first message type is introduced */
+};
+
+enum devlink_hwmsg_dir {
+	DEVLINK_HWMSG_DIR_TO_HW,
+	DEVLINK_HWMSG_DIR_FROM_HW,
+};
+
+enum devlink_port_type {
+	DEVLINK_PORT_TYPE_NOTSET,
+	DEVLINK_PORT_TYPE_AUTO,
+	DEVLINK_PORT_TYPE_ETH,
+	DEVLINK_PORT_TYPE_IB,
+};
+
+enum devlink_attr {
+	/* don't change the order or add anything between, this is ABI! */
+	DEVLINK_ATTR_UNSPEC,
+
+	DEVLINK_ATTR_INDEX,			/* u32 */
+#define DEVLINK_ATTR_NAME_MAX_LEN 32
+	DEVLINK_ATTR_NAME,			/* string */
+	DEVLINK_ATTR_BUS_NAME,			/* string */
+	DEVLINK_ATTR_DEV_NAME,			/* string */
+	DEVLINK_ATTR_HWMSG_PAYLOAD,		/* data */
+	DEVLINK_ATTR_HWMSG_TYPE,		/* u32 */
+	DEVLINK_ATTR_HWMSG_DIR,			/* u8 */
+	DEVLINK_ATTR_PORT_INDEX,		/* u32 */
+	DEVLINK_ATTR_PORT_TYPE,			/* u16 */
+	DEVLINK_ATTR_PORT_DESIRED_TYPE,		/* u16 */
+	DEVLINK_ATTR_PORT_NETDEV_IFINDEX,	/* u32 */
+	DEVLINK_ATTR_PORT_NETDEV_NAME,		/* string */
+	DEVLINK_ATTR_PORT_IBDEV_NAME,		/* string */
+
+	/* add new attributes above here, update the policy in devlink.c */
+
+	__DEVLINK_ATTR_MAX,
+	DEVLINK_ATTR_MAX = __DEVLINK_ATTR_MAX - 1
+};
+
+#endif /* _UAPI_LINUX_DEVLINK_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 1743546..7e29ced 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -392,6 +392,13 @@ config LWTUNNEL
 	  weight tunnel endpoint. Tunnel encapsulation parameters are stored
 	  with light weight tunnel state associated with fib routes.
 
+config NET_DEVLINK
+	tristate "Network physical/parent device Netlink interface"
+	help
+	  Network physical/parent device Netlink interface provides
+	  infrastructure to support access to physical chip-wide config and
+	  monitoring.
+
 endif   # if NET
 
 # Used by archs to tell that they support BPF_JIT
diff --git a/net/core/Makefile b/net/core/Makefile
index 0b835de..1f200bf 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -24,3 +24,4 @@ obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
 obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
 obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
+obj-$(CONFIG_NET_DEVLINK) += devlink.o
diff --git a/net/core/devlink.c b/net/core/devlink.c
new file mode 100644
index 0000000..da01de5
--- /dev/null
+++ b/net/core/devlink.c
@@ -0,0 +1,856 @@
+/*
+ * net/core/devlink.c - Network physical/parent device Netlink interface
+ *
+ * Heavily inspired by net/wireless/
+ * Copyright (c) 2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/gfp.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <linux/sysfs.h>
+#include <linux/bitops.h>
+#include <rdma/ib_verbs.h>
+#include <net/netlink.h>
+#include <net/genetlink.h>
+#include <net/rtnetlink.h>
+#include <net/net_namespace.h>
+#include <net/sock.h>
+#include <net/devlink.h>
+
+static LIST_HEAD(devlink_list);
+static DEFINE_MUTEX(devlink_mutex); /* protects devlink and port lists */
+
+static struct net *devlink_net(const struct devlink *devlink)
+{
+	return read_pnet(&devlink->_net);
+}
+
+static void devlink_net_set(struct devlink *devlink, struct net *net)
+{
+	write_pnet(&devlink->_net, net);
+}
+
+static bool devlink_name_exists(struct net *net, const char *name)
+{
+	struct devlink *devlink;
+
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (strcmp(devlink_name(devlink), name) == 0 &&
+		    net_eq(devlink_net(devlink), net))
+			return true;
+	}
+	return false;
+}
+
+static struct devlink *devlink_get_by_index(struct net *net, int index)
+{
+	struct devlink *devlink;
+
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (devlink->index == index &&
+		    net_eq(devlink_net(devlink), net))
+			return devlink;
+	}
+	return NULL;
+}
+
+static struct devlink *devlink_get_from_attrs(struct net *net,
+					      struct nlattr **attrs)
+{
+	if (attrs[DEVLINK_ATTR_INDEX]) {
+		u32 index = nla_get_u32(attrs[DEVLINK_ATTR_INDEX]);
+		struct devlink *devlink;
+
+		devlink = devlink_get_by_index(net, index);
+		if (!devlink)
+			return ERR_PTR(-ENODEV);
+		return devlink;
+	}
+	return ERR_PTR(-EINVAL);
+}
+
+static struct devlink *devlink_get_from_info(struct genl_info *info)
+{
+	return devlink_get_from_attrs(genl_info_net(info), info->attrs);
+}
+
+static struct devlink_port *devlink_port_get_by_index(struct devlink *devlink,
+						      int port_index)
+{
+	struct devlink_port *devlink_port;
+
+	list_for_each_entry(devlink_port, &devlink->port_list, list) {
+		if (devlink_port->index == port_index)
+			return devlink_port;
+	}
+	return NULL;
+}
+
+static bool devlink_port_index_exists(struct devlink *devlink, int port_index)
+{
+	return devlink_port_get_by_index(devlink, port_index);
+}
+
+static struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink,
+							struct nlattr **attrs)
+{
+	if (attrs[DEVLINK_ATTR_PORT_INDEX]) {
+		u32 port_index = nla_get_u32(attrs[DEVLINK_ATTR_PORT_INDEX]);
+		struct devlink_port *devlink_port;
+
+		devlink_port = devlink_port_get_by_index(devlink, port_index);
+		if (!devlink_port)
+			return ERR_PTR(-ENODEV);
+		return devlink_port;
+	}
+	return ERR_PTR(-EINVAL);
+}
+
+static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink,
+						       struct genl_info *info)
+{
+	return devlink_port_get_from_attrs(devlink, info->attrs);
+}
+
+#define DEVLINK_NL_FLAG_NEED_PORT	BIT(0)
+
+static int devlink_nl_pre_doit(const struct genl_ops *ops,
+			       struct sk_buff *skb, struct genl_info *info)
+{
+	struct devlink *devlink;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	devlink = devlink_get_from_info(info);
+	if (IS_ERR(devlink)) {
+		err = PTR_ERR(devlink);
+		goto err_out;
+	}
+	info->user_ptr[0] = devlink;
+	if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
+		struct devlink_port *devlink_port;
+
+		devlink_port = devlink_port_get_from_info(devlink, info);
+		if (IS_ERR(devlink_port)) {
+			err = PTR_ERR(devlink_port);
+			goto err_out;
+		}
+		info->user_ptr[1] = devlink_port;
+	}
+	return 0;
+
+err_out:
+	mutex_unlock(&devlink_mutex);
+	return err;
+}
+
+static void devlink_nl_post_doit(const struct genl_ops *ops,
+				 struct sk_buff *skb, struct genl_info *info)
+{
+	mutex_unlock(&devlink_mutex);
+}
+
+static struct genl_family devlink_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= DEVLINK_GENL_NAME,
+	.version	= DEVLINK_GENL_VERSION,
+	.maxattr	= DEVLINK_ATTR_MAX,
+	.netnsok	= true,
+	.pre_doit	= devlink_nl_pre_doit,
+	.post_doit	= devlink_nl_post_doit,
+};
+
+enum devlink_multicast_groups {
+	DEVLINK_MCGRP_CONFIG,
+	DEVLINK_MCGRP_HWMSG,
+};
+
+static const struct genl_multicast_group devlink_nl_mcgrps[] = {
+	[DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME },
+	[DEVLINK_MCGRP_HWMSG] = { .name = DEVLINK_GENL_MCGRP_HWMSG_NAME },
+};
+
+static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
+			   enum devlink_command cmd, u32 portid,
+			   u32 seq, int flags)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(msg, DEVLINK_ATTR_INDEX, devlink->index))
+		goto nla_put_failure;
+	if (nla_put_string(msg, DEVLINK_ATTR_NAME, devlink_name(devlink)))
+		goto nla_put_failure;
+
+	if (devlink->dev.parent) {
+		struct device *dev = devlink->dev.parent;
+
+		if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, dev->bus->name))
+			goto nla_put_failure;
+		if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, dev_name(dev)))
+			goto nla_put_failure;
+	}
+
+	genlmsg_end(msg, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static void devlink_notify(struct devlink *devlink, enum devlink_command cmd)
+{
+	struct sk_buff *msg;
+	int err;
+
+	WARN_ON(cmd != DEVLINK_CMD_NEW && cmd != DEVLINK_CMD_DEL);
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return;
+
+	err = devlink_nl_fill(msg, devlink, cmd, 0, 0, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+}
+
+static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
+				struct devlink_port *devlink_port,
+				enum devlink_command cmd, u32 portid,
+				u32 seq, int flags)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(msg, DEVLINK_ATTR_INDEX, devlink->index))
+		goto nla_put_failure;
+	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index))
+		goto nla_put_failure;
+	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_TYPE, devlink_port->type))
+		goto nla_put_failure;
+	if (devlink_port->desired_type != DEVLINK_PORT_TYPE_NOTSET &&
+	    nla_put_u16(msg, DEVLINK_ATTR_PORT_DESIRED_TYPE,
+			devlink_port->desired_type))
+		goto nla_put_failure;
+	if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) {
+		struct net_device *netdev = devlink_port->type_dev;
+
+		if (netdev &&
+		    (nla_put_u32(msg, DEVLINK_ATTR_PORT_NETDEV_IFINDEX,
+				 netdev->ifindex) ||
+		     nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME,
+				    netdev->name)))
+			goto nla_put_failure;
+	}
+	if (devlink_port->type == DEVLINK_PORT_TYPE_IB) {
+		struct ib_device *ibdev = devlink_port->type_dev;
+
+		if (ibdev &&
+		    nla_put_string(msg, DEVLINK_ATTR_PORT_IBDEV_NAME,
+				   ibdev->name))
+			goto nla_put_failure;
+	}
+
+	genlmsg_end(msg, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static void devlink_port_notify(struct devlink_port *devlink_port,
+				enum devlink_command cmd)
+{
+	struct devlink *devlink = devlink_port->devlink;
+	struct sk_buff *msg;
+	int err;
+
+	WARN_ON(cmd != DEVLINK_CMD_PORT_NEW && cmd != DEVLINK_CMD_PORT_DEL);
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return;
+
+	err = devlink_nl_port_fill(msg, devlink, devlink_port, cmd, 0, 0, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+}
+
+static int devlink_hwmsg_nl_fill(struct sk_buff *msg, struct devlink *devlink,
+				 const char *buf, size_t buf_len,
+				 enum devlink_hwmsg_type type,
+				 enum devlink_hwmsg_dir dir,
+				 enum devlink_command cmd, u32 portid,
+				 u32 seq, int flags)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(msg, DEVLINK_ATTR_INDEX, devlink->index))
+		goto nla_put_failure;
+	if (nla_put(msg, DEVLINK_ATTR_HWMSG_PAYLOAD, buf_len, buf))
+		goto nla_put_failure;
+	if (nla_put_u32(msg, DEVLINK_ATTR_HWMSG_TYPE, type))
+		goto nla_put_failure;
+	if (nla_put_u8(msg, DEVLINK_ATTR_HWMSG_DIR, dir))
+		goto nla_put_failure;
+
+	genlmsg_end(msg, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+/**
+ *	devlink_hwmsg_notify - Send HW message notification via netlink
+ *
+ *	@devlink: devlink
+ *	@buf: HW message
+ *	@buf_len: message length
+ *	@type: message type
+ *	@dir: direction of message flow
+ *	@gfp_mask: alloc mask
+ */
+void devlink_hwmsg_notify(struct devlink *devlink,
+			  const char *buf, size_t buf_len,
+			  enum devlink_hwmsg_type type,
+			  enum devlink_hwmsg_dir dir,
+			  gfp_t gfp_mask)
+{
+	struct sk_buff *msg;
+	int err;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, gfp_mask);
+	if (!msg)
+		return;
+
+	err = devlink_hwmsg_nl_fill(msg, devlink, buf, buf_len, type, dir,
+				    DEVLINK_CMD_HWMSG_NEW, 0, 0, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_HWMSG, GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(devlink_hwmsg_notify);
+
+static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct sk_buff *msg;
+	int err;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
+			      info->snd_portid, info->snd_seq, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg,
+				     struct netlink_callback *cb)
+{
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		if (idx < start) {
+			idx++;
+			continue;
+		}
+		err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
+				      NETLINK_CB(cb->skb).portid,
+				      cb->nlh->nlmsg_seq, NLM_F_MULTI);
+		if (err)
+			goto out;
+		idx++;
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
+static int devlink_rename(struct devlink *devlink, const char *newname)
+{
+	int err;
+
+	if (strcmp(newname, devlink_name(devlink)) == 0)
+		return 0;
+	if (devlink_name_exists(devlink_net(devlink), newname))
+		return -EINVAL;
+	if (!dev_valid_name(newname))
+		return -EINVAL;
+	err = device_rename(&devlink->dev, newname);
+	if (err)
+		return err;
+	devlink_notify(devlink, DEVLINK_CMD_NEW);
+	return 0;
+}
+
+static int devlink_nl_cmd_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	int err;
+
+	if (info->attrs[DEVLINK_ATTR_NAME]) {
+		err = devlink_rename(devlink,
+				     nla_data(info->attrs[DEVLINK_ATTR_NAME]));
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_port *devlink_port = info->user_ptr[1];
+	struct sk_buff *msg;
+	int err;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_port_fill(msg, devlink, devlink_port,
+				   DEVLINK_CMD_PORT_NEW,
+				   info->snd_portid, info->snd_seq, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
+					  struct netlink_callback *cb)
+{
+	struct devlink *devlink;
+	struct devlink_port *devlink_port;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		list_for_each_entry(devlink_port, &devlink->port_list, list) {
+			if (idx < start) {
+				idx++;
+				continue;
+			}
+			err = devlink_nl_port_fill(msg, devlink, devlink_port,
+						   DEVLINK_CMD_NEW,
+						   NETLINK_CB(cb->skb).portid,
+						   cb->nlh->nlmsg_seq,
+						   NLM_F_MULTI);
+			if (err)
+				goto out;
+			idx++;
+		}
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
+static int devlink_port_type_set(struct devlink *devlink,
+				 struct devlink_port *devlink_port,
+				 enum devlink_port_type port_type)
+
+{
+	int err;
+
+	if (devlink->ops && devlink->ops->port_type_set) {
+		if (port_type == DEVLINK_PORT_TYPE_NOTSET)
+			return -EINVAL;
+		err = devlink->ops->port_type_set(devlink_port, port_type);
+		if (err)
+			return err;
+		devlink_port->desired_type = port_type;
+		devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+		return 0;
+	}
+	return -EOPNOTSUPP;
+}
+
+static int devlink_nl_cmd_port_set_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_port *devlink_port = info->user_ptr[1];
+	int err;
+
+	if (info->attrs[DEVLINK_ATTR_PORT_TYPE]) {
+		enum devlink_port_type port_type;
+
+		port_type = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_TYPE]);
+		err = devlink_port_type_set(devlink, devlink_port, port_type);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
+	[DEVLINK_ATTR_INDEX] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_NAME] = { .type = NLA_NUL_STRING,
+				.len = DEVLINK_ATTR_NAME_MAX_LEN },
+	[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_PORT_TYPE] = { .type = NLA_U16 },
+};
+
+static const struct genl_ops devlink_nl_ops[] = {
+	{
+		.cmd = DEVLINK_CMD_GET,
+		.doit = devlink_nl_cmd_get_doit,
+		.dumpit = devlink_nl_cmd_get_dumpit,
+		.policy = devlink_nl_policy,
+		/* can be retrieved by unprivileged users */
+	},
+	{
+		.cmd = DEVLINK_CMD_SET,
+		.doit = devlink_nl_cmd_set_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = DEVLINK_CMD_PORT_GET,
+		.doit = devlink_nl_cmd_port_get_doit,
+		.dumpit = devlink_nl_cmd_port_get_dumpit,
+		.policy = devlink_nl_policy,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_PORT,
+		/* can be retrieved by unprivileged users */
+	},
+	{
+		.cmd = DEVLINK_CMD_PORT_SET,
+		.doit = devlink_nl_cmd_port_set_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_PORT,
+	},
+};
+
+static struct class devlink_class;
+
+/**
+ *	devlink_alloc - Allocate new devlink instance resources
+ *
+ *	@ops: ops
+ *	@priv_size: size of user private data
+ *
+ *	Allocate new devlink instance resources, including devlink index
+ *	and name.
+ */
+struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
+{
+	static atomic_t dev_counter = ATOMIC_INIT(0);
+	struct devlink *devlink;
+
+	devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL);
+	if (!devlink)
+		return NULL;
+	devlink->ops = ops;
+	devlink_net_set(devlink, &init_net);
+
+different_name:
+	devlink->index = atomic_inc_return(&dev_counter);
+	if (devlink->index < 0) {
+		/* wrapped */
+		atomic_dec(&dev_counter);
+		kfree(devlink);
+		return NULL;
+	}
+	/* atomic_inc_return makes it start at 1, make it start at 0 */
+	devlink->index--;
+
+	dev_set_name(&devlink->dev, DEVLINK_GENL_NAME "%d", devlink->index);
+	if (devlink_name_exists(devlink_net(devlink), devlink_name(devlink)))
+		goto different_name;
+
+	INIT_LIST_HEAD(&devlink->port_list);
+	device_initialize(&devlink->dev);
+	devlink->dev.class = &devlink_class;
+	devlink->dev.platform_data = devlink;
+	return devlink;
+}
+EXPORT_SYMBOL_GPL(devlink_alloc);
+
+/**
+ *	devlink_register - Register devlink instance
+ *
+ *	@devlink: devlink
+ */
+int devlink_register(struct devlink *devlink)
+{
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	err = device_add(&devlink->dev);
+	if (err)
+		goto unlock;
+	list_add_tail(&devlink->list, &devlink_list);
+	devlink_notify(devlink, DEVLINK_CMD_NEW);
+unlock:
+	mutex_unlock(&devlink_mutex);
+	return err;
+}
+EXPORT_SYMBOL_GPL(devlink_register);
+
+/**
+ *	devlink_unregister - Unregister devlink instance
+ *
+ *	@devlink: devlink
+ */
+void devlink_unregister(struct devlink *devlink)
+{
+	mutex_lock(&devlink_mutex);
+	devlink_notify(devlink, DEVLINK_CMD_DEL);
+	list_del(&devlink->list);
+	device_del(&devlink->dev);
+	mutex_unlock(&devlink_mutex);
+}
+EXPORT_SYMBOL_GPL(devlink_unregister);
+
+/**
+ *	devlink_free - Free devlink instance resources
+ *
+ *	@devlink: devlink
+ */
+void devlink_free(struct devlink *devlink)
+{
+	put_device(&devlink->dev);
+}
+EXPORT_SYMBOL_GPL(devlink_free);
+
+/**
+ *	devlink_port_register - Register devlink port
+ *
+ *	@devlink: devlink
+ *	@devlink_port: devlink port
+ *	@port_index
+ *
+ *	Register devlink port with provided port index. User can use
+ *	any indexing, even hw-related one. devlink_port structure
+ *	is convenient to be embedded inside user driver private structure.
+ */
+int devlink_port_register(struct devlink *devlink,
+			  struct devlink_port *devlink_port,
+			  unsigned int port_index)
+{
+	mutex_lock(&devlink_mutex);
+	if (devlink_port_index_exists(devlink, port_index)) {
+		mutex_unlock(&devlink_mutex);
+		return -EEXIST;
+	}
+	devlink_port->devlink = devlink;
+	devlink_port->index = port_index;
+	devlink_port->type = DEVLINK_PORT_TYPE_NOTSET;
+	list_add_tail(&devlink_port->list, &devlink->port_list);
+	mutex_unlock(&devlink_mutex);
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_port_register);
+
+/**
+ *	devlink_port_unregister - Unregister devlink port
+ *
+ *	@devlink_port: devlink port
+ */
+void devlink_port_unregister(struct devlink_port *devlink_port)
+{
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL);
+	mutex_lock(&devlink_mutex);
+	list_del(&devlink_port->list);
+	mutex_unlock(&devlink_mutex);
+}
+EXPORT_SYMBOL_GPL(devlink_port_unregister);
+
+static void __devlink_port_type_set(struct devlink_port *devlink_port,
+				    enum devlink_port_type type,
+				    void *type_dev)
+{
+	devlink_port->type = type;
+	devlink_port->type_dev = type_dev;
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+}
+
+/**
+ *	devlink_port_type_eth_set - Set port type to Ethernet
+ *
+ *	@devlink_port: devlink port
+ *	@netdev: related netdevice
+ */
+void devlink_port_type_eth_set(struct devlink_port *devlink_port,
+			       struct net_device *netdev)
+{
+	return __devlink_port_type_set(devlink_port,
+				       DEVLINK_PORT_TYPE_ETH, netdev);
+}
+EXPORT_SYMBOL_GPL(devlink_port_type_eth_set);
+
+/**
+ *	devlink_port_type_ib_set - Set port type to InfiniBand
+ *
+ *	@devlink_port: devlink port
+ *	@ibdev: related IB device
+ */
+void devlink_port_type_ib_set(struct devlink_port *devlink_port,
+			      struct ib_device *ibdev)
+{
+	return __devlink_port_type_set(devlink_port,
+				       DEVLINK_PORT_TYPE_IB, ibdev);
+}
+EXPORT_SYMBOL_GPL(devlink_port_type_ib_set);
+
+/**
+ *	devlink_port_type_clear - Clear port type
+ *
+ *	@devlink_port: devlink port
+ */
+void devlink_port_type_clear(struct devlink_port *devlink_port)
+{
+	return __devlink_port_type_set(devlink_port,
+				       DEVLINK_PORT_TYPE_NOTSET, NULL);
+}
+EXPORT_SYMBOL_GPL(devlink_port_type_clear);
+
+static void __devlink_free(struct devlink *devlink)
+{
+	kfree(devlink);
+}
+
+static struct devlink *dev_to_devlink(struct device *dev)
+{
+	return container_of(dev, struct devlink, dev);
+}
+
+static ssize_t index_show(struct device *dev,
+			  struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", dev_to_devlink(dev)->index);
+}
+static DEVICE_ATTR_RO(index);
+
+static ssize_t name_show(struct device *dev,
+			 struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%s\n", dev_name(dev));
+}
+static DEVICE_ATTR_RO(name);
+
+static struct attribute *devlink_attrs[] = {
+	&dev_attr_index.attr,
+	&dev_attr_name.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(devlink);
+
+static void devlink_dev_release(struct device *dev)
+{
+	__devlink_free(dev_to_devlink(dev));
+}
+
+static const void *devlink_namespace(struct device *dev)
+{
+	return devlink_net(dev_to_devlink(dev));
+}
+
+static struct class devlink_class = {
+	.name = DEVLINK_GENL_NAME,
+	.owner = THIS_MODULE,
+	.dev_release = devlink_dev_release,
+	.dev_groups = devlink_groups,
+	.ns_type = &net_ns_type_operations,
+	.namespace = devlink_namespace,
+};
+
+static int __init devlink_module_init(void)
+{
+	int err;
+
+	err = class_register(&devlink_class);
+	if (err)
+		return err;
+	err = genl_register_family_with_ops_groups(&devlink_nl_family,
+						   devlink_nl_ops,
+						   devlink_nl_mcgrps);
+	if (err)
+		goto err_genl_register;
+	return 0;
+
+err_genl_register:
+	class_unregister(&devlink_class);
+	return err;
+}
+
+static void __exit devlink_module_exit(void)
+{
+	genl_unregister_family(&devlink_nl_family);
+	class_unregister(&devlink_class);
+}
+
+module_init(devlink_module_init);
+module_exit(devlink_module_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jiri@mellanox.com>");
+MODULE_DESCRIPTION("Network physical device Netlink interface");
+MODULE_ALIAS_GENL_FAMILY(DEVLINK_GENL_NAME);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [patch net-next RFC 2/6] mlxsw: Implement devlink interface
  2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
  2016-02-03 10:47 ` [patch net-next RFC 1/6] Introduce devlink infrastructure Jiri Pirko
@ 2016-02-03 10:47 ` Jiri Pirko
  2016-02-03 10:47 ` [patch net-next RFC 3/6] mlxsw: Implement hardware messages notification using devlink Jiri Pirko
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2016-02-03 10:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

From: Jiri Pirko <jiri@mellanox.com>

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c     | 24 ++++++++++++++++++------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 20 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  2 ++
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 20 ++++++++++++++++++++
 4 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 22379eb..57d9655 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -56,6 +56,7 @@
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <asm/byteorder.h>
+#include <net/devlink.h>
 
 #include "core.h"
 #include "item.h"
@@ -791,6 +792,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	const char *device_kind = mlxsw_bus_info->device_kind;
 	struct mlxsw_core *mlxsw_core;
 	struct mlxsw_driver *mlxsw_driver;
+	struct devlink *devlink;
 	size_t alloc_size;
 	int err;
 
@@ -798,12 +800,13 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (!mlxsw_driver)
 		return -EINVAL;
 	alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
-	mlxsw_core = kzalloc(alloc_size, GFP_KERNEL);
-	if (!mlxsw_core) {
+	devlink = devlink_alloc(NULL, alloc_size);
+	if (!devlink) {
 		err = -ENOMEM;
-		goto err_core_alloc;
+		goto err_devlink_alloc;
 	}
 
+	mlxsw_core = devlink_priv(devlink);
 	INIT_LIST_HEAD(&mlxsw_core->rx_listener_list);
 	INIT_LIST_HEAD(&mlxsw_core->event_listener_list);
 	mlxsw_core->driver = mlxsw_driver;
@@ -841,6 +844,11 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (err)
 		goto err_hwmon_init;
 
+	set_devlink_dev(devlink, mlxsw_bus_info->dev);
+	err = devlink_register(devlink);
+	if (err)
+		goto err_devlink_register;
+
 	err = mlxsw_driver->init(mlxsw_core->driver_priv, mlxsw_core,
 				 mlxsw_bus_info);
 	if (err)
@@ -855,6 +863,8 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 err_debugfs_init:
 	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
 err_driver_init:
+	devlink_unregister(devlink);
+err_devlink_register:
 err_hwmon_init:
 	mlxsw_emad_fini(mlxsw_core);
 err_emad_init:
@@ -864,8 +874,8 @@ err_bus_init:
 err_alloc_lag_mapping:
 	free_percpu(mlxsw_core->pcpu_stats);
 err_alloc_stats:
-	kfree(mlxsw_core);
-err_core_alloc:
+	devlink_free(devlink);
+err_devlink_alloc:
 	mlxsw_core_driver_put(device_kind);
 	return err;
 }
@@ -874,14 +884,16 @@ EXPORT_SYMBOL(mlxsw_core_bus_device_register);
 void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 {
 	const char *device_kind = mlxsw_core->bus_info->device_kind;
+	struct devlink *devlink = priv_to_devlink(mlxsw_core);
 
 	mlxsw_core_debugfs_fini(mlxsw_core);
 	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+	devlink_unregister(devlink);
 	mlxsw_emad_fini(mlxsw_core);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
 	kfree(mlxsw_core->lag.mapping);
 	free_percpu(mlxsw_core->pcpu_stats);
-	kfree(mlxsw_core);
+	devlink_free(devlink);
 	mlxsw_core_driver_put(device_kind);
 }
 EXPORT_SYMBOL(mlxsw_core_bus_device_unregister);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 217856b..9d4b06c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -49,6 +49,7 @@
 #include <linux/jiffies.h>
 #include <linux/bitops.h>
 #include <linux/list.h>
+#include <net/devlink.h>
 #include <net/switchdev.h>
 #include <generated/utsrelease.h>
 
@@ -1351,7 +1352,9 @@ static const struct ethtool_ops mlxsw_sp_port_ethtool_ops = {
 
 static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 {
+	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
 	struct mlxsw_sp_port *mlxsw_sp_port;
+	struct devlink_port *devlink_port;
 	struct net_device *dev;
 	bool usable;
 	size_t bytes;
@@ -1360,6 +1363,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	dev = alloc_etherdev(sizeof(struct mlxsw_sp_port));
 	if (!dev)
 		return -ENOMEM;
+	SET_NETDEV_DEV(dev, devlink_dev(devlink));
 	mlxsw_sp_port = netdev_priv(dev);
 	mlxsw_sp_port->dev = dev;
 	mlxsw_sp_port->mlxsw_sp = mlxsw_sp;
@@ -1417,6 +1421,14 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 		goto port_not_usable;
 	}
 
+	devlink_port = &mlxsw_sp_port->devlink_port;
+	err = devlink_port_register(devlink, devlink_port, local_port);
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to register devlink port\n",
+			mlxsw_sp_port->local_port);
+		goto err_devlink_port_register;
+	}
+
 	err = mlxsw_sp_port_system_port_mapping_set(mlxsw_sp_port);
 	if (err) {
 		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to set system port mapping\n",
@@ -1457,6 +1469,8 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 		goto err_register_netdev;
 	}
 
+	devlink_port_type_eth_set(devlink_port, dev);
+
 	err = mlxsw_sp_port_vlan_init(mlxsw_sp_port);
 	if (err)
 		goto err_port_vlan_init;
@@ -1472,6 +1486,8 @@ err_port_admin_status_set:
 err_port_mtu_set:
 err_port_swid_set:
 err_port_system_port_mapping_set:
+	devlink_port_unregister(&mlxsw_sp_port->devlink_port);
+err_devlink_port_register:
 port_not_usable:
 err_port_module_check:
 err_dev_addr_init:
@@ -1505,10 +1521,14 @@ static void mlxsw_sp_port_vports_fini(struct mlxsw_sp_port *mlxsw_sp_port)
 static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = mlxsw_sp->ports[local_port];
+	struct devlink_port *devlink_port;
 
 	if (!mlxsw_sp_port)
 		return;
+	devlink_port = &mlxsw_sp_port->devlink_port;
+	devlink_port_type_clear(devlink_port);
 	unregister_netdev(mlxsw_sp_port->dev); /* This calls ndo_stop */
+	devlink_port_unregister(devlink_port);
 	mlxsw_sp_port_vports_fini(mlxsw_sp_port);
 	mlxsw_sp_port_switchdev_fini(mlxsw_sp_port);
 	free_percpu(mlxsw_sp_port->pcpu_stats);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 7f42eb1..0df9a19 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -43,6 +43,7 @@
 #include <linux/if_vlan.h>
 #include <linux/list.h>
 #include <net/switchdev.h>
+#include <net/devlink.h>
 
 #include "port.h"
 #include "core.h"
@@ -162,6 +163,7 @@ struct mlxsw_sp_port {
 	unsigned long *untagged_vlans;
 	/* VLAN interfaces */
 	struct list_head vports_list;
+	struct devlink_port devlink_port;
 };
 
 static inline struct mlxsw_sp_port *
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index d85960c..7a60a26 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -43,6 +43,7 @@
 #include <linux/device.h>
 #include <linux/skbuff.h>
 #include <linux/if_vlan.h>
+#include <net/devlink.h>
 #include <net/switchdev.h>
 #include <generated/utsrelease.h>
 
@@ -78,6 +79,7 @@ struct mlxsw_sx_port {
 	struct mlxsw_sx_port_pcpu_stats __percpu *pcpu_stats;
 	struct mlxsw_sx *mlxsw_sx;
 	u8 local_port;
+	struct devlink_port devlink_port;
 };
 
 /* tx_hdr_version
@@ -953,7 +955,9 @@ mlxsw_sx_port_mac_learning_mode_set(struct mlxsw_sx_port *mlxsw_sx_port,
 
 static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 {
+	struct devlink *devlink = priv_to_devlink(mlxsw_sx->core);
 	struct mlxsw_sx_port *mlxsw_sx_port;
+	struct devlink_port *devlink_port;
 	struct net_device *dev;
 	bool usable;
 	int err;
@@ -1007,6 +1011,14 @@ static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 		goto port_not_usable;
 	}
 
+	devlink_port = &mlxsw_sx_port->devlink_port;
+	err = devlink_port_register(devlink, devlink_port, local_port);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to register devlink port\n",
+			mlxsw_sx_port->local_port);
+		goto err_devlink_port_register;
+	}
+
 	err = mlxsw_sx_port_system_port_mapping_set(mlxsw_sx_port);
 	if (err) {
 		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to set system port mapping\n",
@@ -1064,6 +1076,8 @@ static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 		goto err_register_netdev;
 	}
 
+	devlink_port_type_eth_set(devlink_port, dev);
+
 	mlxsw_sx->ports[local_port] = mlxsw_sx_port;
 	return 0;
 
@@ -1075,6 +1089,8 @@ err_port_mtu_set:
 err_port_speed_set:
 err_port_swid_set:
 err_port_system_port_mapping_set:
+	devlink_port_unregister(&mlxsw_sx_port->devlink_port);
+err_devlink_port_register:
 port_not_usable:
 err_port_module_check:
 err_dev_addr_get:
@@ -1087,11 +1103,15 @@ err_alloc_stats:
 static void mlxsw_sx_port_remove(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 {
 	struct mlxsw_sx_port *mlxsw_sx_port = mlxsw_sx->ports[local_port];
+	struct devlink_port *devlink_port;
 
 	if (!mlxsw_sx_port)
 		return;
+	devlink_port = &mlxsw_sx_port->devlink_port;
+	devlink_port_type_clear(devlink_port);
 	unregister_netdev(mlxsw_sx_port->dev); /* This calls ndo_stop */
 	mlxsw_sx_port_swid_set(mlxsw_sx_port, MLXSW_PORT_SWID_DISABLED_PORT);
+	devlink_port_unregister(devlink_port);
 	free_percpu(mlxsw_sx_port->pcpu_stats);
 	free_netdev(mlxsw_sx_port->dev);
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [patch net-next RFC 3/6] mlxsw: Implement hardware messages notification using devlink
  2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
  2016-02-03 10:47 ` [patch net-next RFC 1/6] Introduce devlink infrastructure Jiri Pirko
  2016-02-03 10:47 ` [patch net-next RFC 2/6] mlxsw: Implement devlink interface Jiri Pirko
@ 2016-02-03 10:47 ` Jiri Pirko
  2016-02-03 10:48 ` [patch net-next RFC 4/6] mlx4: Implement devlink interface Jiri Pirko
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2016-02-03 10:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

From: Jiri Pirko <jiri@mellanox.com>

Use devlink HW message notification facilities to pass massages going
to and from HW to userspace via Netlink multicast.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 15 +++++++++++----
 include/uapi/linux/devlink.h               |  2 +-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 57d9655..da4e6c9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -65,6 +65,7 @@
 #include "trap.h"
 #include "emad.h"
 #include "reg.h"
+#include "txheader.h"
 
 static LIST_HEAD(mlxsw_core_driver_list);
 static DEFINE_SPINLOCK(mlxsw_core_driver_list_lock);
@@ -1099,7 +1100,11 @@ static int mlxsw_core_reg_access_emad(struct mlxsw_core *mlxsw_core,
 
 	dev_dbg(mlxsw_core->bus_info->dev, "EMAD send (tid=%llx)\n",
 		mlxsw_core->emad.tid);
-	mlxsw_core_buf_dump_dbg(mlxsw_core, skb->data, skb->len);
+	devlink_hwmsg_notify(priv_to_devlink(mlxsw_core),
+			     skb->data + MLXSW_TXHDR_LEN,
+			     skb->len - MLXSW_TXHDR_LEN,
+			     DEVLINK_HWMSG_TYPE_MLX_EMAD,
+			     DEVLINK_HWMSG_DIR_TO_HW, GFP_KERNEL);
 
 	err = mlxsw_emad_transmit(mlxsw_core, skb, &tx_info);
 	if (!err) {
@@ -1109,9 +1114,11 @@ static int mlxsw_core_reg_access_emad(struct mlxsw_core *mlxsw_core,
 
 		dev_dbg(mlxsw_core->bus_info->dev, "EMAD recv (tid=%llx)\n",
 			mlxsw_core->emad.tid - 1);
-		mlxsw_core_buf_dump_dbg(mlxsw_core,
-					mlxsw_core->emad.resp_skb->data,
-					mlxsw_core->emad.resp_skb->len);
+		devlink_hwmsg_notify(priv_to_devlink(mlxsw_core),
+				     mlxsw_core->emad.resp_skb->data,
+				     mlxsw_core->emad.resp_skb->len,
+				     DEVLINK_HWMSG_TYPE_MLX_EMAD,
+				     DEVLINK_HWMSG_DIR_FROM_HW, GFP_KERNEL);
 
 		dev_kfree_skb(mlxsw_core->emad.resp_skb);
 	}
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 1d9f999..761612b 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -40,7 +40,7 @@ enum devlink_command {
 };
 
 enum devlink_hwmsg_type {
-	DEVLINK_HWMSG_TYPE_TMP, /* temporary, until first message type is introduced */
+	DEVLINK_HWMSG_TYPE_MLX_EMAD, /* Mellanox EMAD packet */
 };
 
 enum devlink_hwmsg_dir {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [patch net-next RFC 4/6] mlx4: Implement devlink interface
  2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (2 preceding siblings ...)
  2016-02-03 10:47 ` [patch net-next RFC 3/6] mlxsw: Implement hardware messages notification using devlink Jiri Pirko
@ 2016-02-03 10:48 ` Jiri Pirko
  2016-02-16 16:43   ` Or Gerlitz
  2016-02-03 10:48 ` [patch net-next RFC 5/6] mlx4: Implement hardware messages notification using devlink Jiri Pirko
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2016-02-03 10:48 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

From: Jiri Pirko <jiri@mellanox.com>

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/infiniband/hw/mlx4/main.c              |  7 ++++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  8 ++++-
 drivers/net/ethernet/mellanox/mlx4/intf.c      |  9 ++++++
 drivers/net/ethernet/mellanox/mlx4/main.c      | 45 +++++++++++++++++++-------
 drivers/net/ethernet/mellanox/mlx4/mlx4.h      |  2 ++
 include/linux/mlx4/driver.h                    |  3 ++
 6 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 1c7ab6c..a15a7b3 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -41,6 +41,7 @@
 #include <linux/if_vlan.h>
 #include <net/ipv6.h>
 #include <net/addrconf.h>
+#include <net/devlink.h>
 
 #include <rdma/ib_smi.h>
 #include <rdma/ib_user_verbs.h>
@@ -2519,6 +2520,9 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	}
 
 	ibdev->ib_active = true;
+	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+		devlink_port_type_ib_set(mlx4_get_devlink_port(dev, i),
+					 &ibdev->ib_dev);
 
 	if (mlx4_is_mfunc(ibdev->dev))
 		init_pkeys(ibdev);
@@ -2643,7 +2647,10 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
 {
 	struct mlx4_ib_dev *ibdev = ibdev_ptr;
 	int p;
+	int i;
 
+	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+		devlink_port_type_clear(mlx4_get_devlink_port(dev, i));
 	ibdev->ib_active = false;
 	flush_workqueue(wq);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0c7e3f6..17ac3b0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -40,6 +40,7 @@
 #include <net/ip.h>
 #include <net/busy_poll.h>
 #include <net/vxlan.h>
+#include <net/devlink.h>
 
 #include <linux/mlx4/driver.h>
 #include <linux/mlx4/device.h>
@@ -2024,8 +2025,11 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 	en_dbg(DRV, priv, "Destroying netdev on port:%d\n", priv->port);
 
 	/* Unregister device - this will close the port if it was up */
-	if (priv->registered)
+	if (priv->registered) {
+		devlink_port_type_clear(mlx4_get_devlink_port(mdev->dev,
+							      priv->port));
 		unregister_netdev(dev);
+	}
 
 	if (priv->allocated)
 		mlx4_free_hwq_res(mdev->dev, &priv->res, MLX4_EN_PAGE_SIZE);
@@ -3041,6 +3045,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	}
 
 	priv->registered = 1;
+	devlink_port_type_eth_set(mlx4_get_devlink_port(mdev->dev, priv->port),
+				  dev);
 
 	return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 0472941..dec77d6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -34,6 +34,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/errno.h>
+#include <net/devlink.h>
 
 #include "mlx4.h"
 
@@ -249,3 +250,11 @@ void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int
 	return result;
 }
 EXPORT_SYMBOL_GPL(mlx4_get_protocol_dev);
+
+struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port)
+{
+	struct mlx4_port_info *info = &mlx4_priv(dev)->port[port];
+
+	return &info->devlink_port;
+}
+EXPORT_SYMBOL_GPL(mlx4_get_devlink_port);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index f1b6d21..a5f54a5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -42,6 +42,7 @@
 #include <linux/io-mapping.h>
 #include <linux/delay.h>
 #include <linux/kmod.h>
+#include <net/devlink.h>
 
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/doorbell.h>
@@ -2847,8 +2848,13 @@ no_msi:
 
 static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 {
+	struct devlink *devlink = priv_to_devlink(mlx4_priv(dev));
 	struct mlx4_port_info *info = &mlx4_priv(dev)->port[port];
-	int err = 0;
+	int err;
+
+	err = devlink_port_register(devlink, &info->devlink_port, port);
+	if (err)
+		return err;
 
 	info->dev = dev;
 	info->port = port;
@@ -2873,6 +2879,7 @@ static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 	err = device_create_file(&dev->persist->pdev->dev, &info->port_attr);
 	if (err) {
 		mlx4_err(dev, "Failed to create file for port %d\n", port);
+		devlink_port_unregister(&info->devlink_port);
 		info->port = -1;
 	}
 
@@ -3646,21 +3653,23 @@ err_disable_pdev:
 
 static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 {
+	struct devlink *devlink;
 	struct mlx4_priv *priv;
 	struct mlx4_dev *dev;
 	int ret;
 
 	printk_once(KERN_INFO "%s", mlx4_version);
 
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
+	devlink = devlink_alloc(NULL, sizeof(*priv));
+	if (!devlink)
 		return -ENOMEM;
+	priv = devlink_priv(devlink);
 
 	dev       = &priv->dev;
 	dev->persist = kzalloc(sizeof(*dev->persist), GFP_KERNEL);
 	if (!dev->persist) {
-		kfree(priv);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto err_devlink_free;
 	}
 	dev->persist->pdev = pdev;
 	dev->persist->dev = dev;
@@ -3669,14 +3678,24 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	mutex_init(&dev->persist->device_state_mutex);
 	mutex_init(&dev->persist->interface_state_mutex);
 
+	set_devlink_dev(devlink, &pdev->dev);
+	ret = devlink_register(devlink);
+	if (ret)
+		goto err_persist_free;
+
 	ret =  __mlx4_init_one(pdev, id->driver_data, priv);
-	if (ret) {
-		kfree(dev->persist);
-		kfree(priv);
-	} else {
-		pci_save_state(pdev);
-	}
+	if (ret)
+		goto err_devlink_unregister;
 
+	pci_save_state(pdev);
+	return 0;
+
+err_devlink_unregister:
+	devlink_unregister(devlink);
+err_persist_free:
+	kfree(dev->persist);
+err_devlink_free:
+	devlink_free(devlink);
 	return ret;
 }
 
@@ -3777,6 +3796,7 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 	struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
 	struct mlx4_dev  *dev  = persist->dev;
 	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct devlink *devlink = priv_to_devlink(priv);
 	int active_vfs = 0;
 
 	mutex_lock(&persist->interface_state_mutex);
@@ -3807,8 +3827,9 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
+	devlink_unregister(devlink);
 	kfree(dev->persist);
-	kfree(priv);
+	devlink_free(devlink);
 	pci_set_drvdata(pdev, NULL);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 7baef52..ef96831 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -45,6 +45,7 @@
 #include <linux/workqueue.h>
 #include <linux/interrupt.h>
 #include <linux/spinlock.h>
+#include <net/devlink.h>
 
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/driver.h>
@@ -828,6 +829,7 @@ struct mlx4_port_info {
 	struct mlx4_roce_gid_table gid_table;
 	int			base_qpn;
 	struct cpu_rmap		*rmap;
+	struct devlink_port	devlink_port;
 };
 
 struct mlx4_sense {
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 2e8af00..bd0e707 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -33,6 +33,7 @@
 #ifndef MLX4_DRIVER_H
 #define MLX4_DRIVER_H
 
+#include <net/devlink.h>
 #include <linux/mlx4/device.h>
 
 struct mlx4_dev;
@@ -89,6 +90,8 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p);
 
 void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int port);
 
+struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port);
+
 static inline u64 mlx4_mac_to_u64(u8 *addr)
 {
 	u64 mac = 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [patch net-next RFC 5/6] mlx4: Implement hardware messages notification using devlink
  2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (3 preceding siblings ...)
  2016-02-03 10:48 ` [patch net-next RFC 4/6] mlx4: Implement devlink interface Jiri Pirko
@ 2016-02-03 10:48 ` Jiri Pirko
  2016-02-03 10:48 ` [patch net-next RFC 6/6] mlx4: Implement port type setting via devlink interface Jiri Pirko
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2016-02-03 10:48 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

From: Jiri Pirko <jiri@mellanox.com>

Use devlink HW message notification facilities to pass massages going
to and from HW to userspace via Netlink multicast.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 9 +++++++++
 include/uapi/linux/devlink.h            | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index d66c690..2735211 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -36,6 +36,7 @@
 #include <linux/mlx4/cmd.h>
 #include <linux/module.h>
 #include <linux/cache.h>
+#include <net/devlink.h>
 
 #include "fw.h"
 #include "icm.h"
@@ -2734,6 +2735,8 @@ static int mlx4_ACCESS_REG(struct mlx4_dev *dev, u16 reg_id,
 {
 	struct mlx4_cmd_mailbox *inbox, *outbox;
 	struct mlx4_access_reg *inbuf, *outbuf;
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct devlink *devlink = priv_to_devlink(priv);
 	int err;
 
 	inbox = mlx4_alloc_cmd_mailbox(dev);
@@ -2760,6 +2763,9 @@ static int mlx4_ACCESS_REG(struct mlx4_dev *dev, u16 reg_id,
 			    ((0x3) << 12));
 
 	memcpy(inbuf->reg_data, reg_data, reg_len);
+	devlink_hwmsg_notify(devlink, reg_data, reg_len,
+			     DEVLINK_HWMSG_TYPE_MLX_CMD_REG,
+			     DEVLINK_HWMSG_DIR_TO_HW, GFP_KERNEL);
 	err = mlx4_cmd_box(dev, inbox->dma, outbox->dma, 0, 0,
 			   MLX4_CMD_ACCESS_REG, MLX4_CMD_TIME_CLASS_C,
 			   MLX4_CMD_WRAPPED);
@@ -2775,6 +2781,9 @@ static int mlx4_ACCESS_REG(struct mlx4_dev *dev, u16 reg_id,
 	}
 
 	memcpy(reg_data, outbuf->reg_data, reg_len);
+	devlink_hwmsg_notify(devlink, reg_data, reg_len,
+			     DEVLINK_HWMSG_TYPE_MLX_CMD_REG,
+			     DEVLINK_HWMSG_DIR_FROM_HW, GFP_KERNEL);
 out:
 	mlx4_free_cmd_mailbox(dev, inbox);
 	mlx4_free_cmd_mailbox(dev, outbox);
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 761612b..f06f4f7 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -41,6 +41,7 @@ enum devlink_command {
 
 enum devlink_hwmsg_type {
 	DEVLINK_HWMSG_TYPE_MLX_EMAD, /* Mellanox EMAD packet */
+	DEVLINK_HWMSG_TYPE_MLX_CMD_REG, /* Mellanox CMD iface register access */
 };
 
 enum devlink_hwmsg_dir {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [patch net-next RFC 6/6] mlx4: Implement port type setting via devlink interface
  2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (4 preceding siblings ...)
  2016-02-03 10:48 ` [patch net-next RFC 5/6] mlx4: Implement hardware messages notification using devlink Jiri Pirko
@ 2016-02-03 10:48 ` Jiri Pirko
  2016-02-03 13:31 ` [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jesper Dangaard Brouer
  2016-02-07 20:18 ` roopa
  7 siblings, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2016-02-03 10:48 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

From: Jiri Pirko <jiri@mellanox.com>

So far, there has been an mlx4-specific sysfs file allowing user to
change port type to either Ethernet of InfiniBand. This is very
inconvenient.

Allow to expose the same ability to set port type in a generic way
using devlink interface.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 86 +++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index a5f54a5..4bac714 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1052,36 +1052,20 @@ static ssize_t show_port_type(struct device *dev,
 	return strlen(buf);
 }
 
-static ssize_t set_port_type(struct device *dev,
-			     struct device_attribute *attr,
-			     const char *buf, size_t count)
+static int __set_port_type(struct mlx4_port_info *info,
+			   enum mlx4_port_type port_type)
 {
-	struct mlx4_port_info *info = container_of(attr, struct mlx4_port_info,
-						   port_attr);
 	struct mlx4_dev *mdev = info->dev;
 	struct mlx4_priv *priv = mlx4_priv(mdev);
 	enum mlx4_port_type types[MLX4_MAX_PORTS];
 	enum mlx4_port_type new_types[MLX4_MAX_PORTS];
-	static DEFINE_MUTEX(set_port_type_mutex);
 	int i;
 	int err = 0;
 
-	mutex_lock(&set_port_type_mutex);
-
-	if (!strcmp(buf, "ib\n"))
-		info->tmp_type = MLX4_PORT_TYPE_IB;
-	else if (!strcmp(buf, "eth\n"))
-		info->tmp_type = MLX4_PORT_TYPE_ETH;
-	else if (!strcmp(buf, "auto\n"))
-		info->tmp_type = MLX4_PORT_TYPE_AUTO;
-	else {
-		mlx4_err(mdev, "%s is not supported port type\n", buf);
-		err = -EINVAL;
-		goto err_out;
-	}
-
 	mlx4_stop_sense(mdev);
 	mutex_lock(&priv->port_mutex);
+	info->tmp_type = port_type;
+
 	/* Possible type is always the one that was delivered */
 	mdev->caps.possible_type[info->port] = info->tmp_type;
 
@@ -1123,6 +1107,37 @@ static ssize_t set_port_type(struct device *dev,
 out:
 	mlx4_start_sense(mdev);
 	mutex_unlock(&priv->port_mutex);
+
+	return err;
+}
+
+static ssize_t set_port_type(struct device *dev,
+			     struct device_attribute *attr,
+			     const char *buf, size_t count)
+{
+	struct mlx4_port_info *info = container_of(attr, struct mlx4_port_info,
+						   port_attr);
+	struct mlx4_dev *mdev = info->dev;
+	enum mlx4_port_type port_type;
+	static DEFINE_MUTEX(set_port_type_mutex);
+	int err;
+
+	mutex_lock(&set_port_type_mutex);
+
+	if (!strcmp(buf, "ib\n")) {
+		port_type = MLX4_PORT_TYPE_IB;
+	} else if (!strcmp(buf, "eth\n")) {
+		port_type = MLX4_PORT_TYPE_ETH;
+	} else if (!strcmp(buf, "auto\n")) {
+		port_type = MLX4_PORT_TYPE_AUTO;
+	} else {
+		mlx4_err(mdev, "%s is not supported port type\n", buf);
+		err = -EINVAL;
+		goto err_out;
+	}
+
+	err = __set_port_type(info, port_type);
+
 err_out:
 	mutex_unlock(&set_port_type_mutex);
 
@@ -3651,6 +3666,35 @@ err_disable_pdev:
 	return err;
 }
 
+static int mlx4_devlink_port_type_set(struct devlink_port *devlink_port,
+				      enum devlink_port_type port_type)
+{
+	struct mlx4_port_info *info = container_of(devlink_port,
+						   struct mlx4_port_info,
+						   devlink_port);
+	enum mlx4_port_type mlx4_port_type;
+
+	switch (port_type) {
+	case DEVLINK_PORT_TYPE_AUTO:
+		mlx4_port_type = MLX4_PORT_TYPE_AUTO;
+		break;
+	case DEVLINK_PORT_TYPE_ETH:
+		mlx4_port_type = MLX4_PORT_TYPE_ETH;
+		break;
+	case DEVLINK_PORT_TYPE_IB:
+		mlx4_port_type = MLX4_PORT_TYPE_IB;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return __set_port_type(info, mlx4_port_type);
+}
+
+static const struct devlink_ops mlx4_devlink_ops = {
+	.port_type_set	= mlx4_devlink_port_type_set,
+};
+
 static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct devlink *devlink;
@@ -3660,7 +3704,7 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	printk_once(KERN_INFO "%s", mlx4_version);
 
-	devlink = devlink_alloc(NULL, sizeof(*priv));
+	devlink = devlink_alloc(&mlx4_devlink_ops, sizeof(*priv));
 	if (!devlink)
 		return -ENOMEM;
 	priv = devlink_priv(devlink);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (5 preceding siblings ...)
  2016-02-03 10:48 ` [patch net-next RFC 6/6] mlx4: Implement port type setting via devlink interface Jiri Pirko
@ 2016-02-03 13:31 ` Jesper Dangaard Brouer
  2016-02-03 13:33   ` Jiri Pirko
  2016-02-07 20:18 ` roopa
  7 siblings, 1 reply; 26+ messages in thread
From: Jesper Dangaard Brouer @ 2016-02-03 13:31 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: brouer, netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, jbenc

On Wed,  3 Feb 2016 11:47:56 +0100
Jiri Pirko <jiri@resnulli.us> wrote:

> From: Jiri Pirko <jiri@mellanox.com>
> 
> There a is need for some userspace API that would allow to expose things
> that are not directly related to any device class like net_device of
> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> 
> Use cases:
> 1) get/set of port type (Ethernet/InfiniBand)
> 2) monitoring of hardware messages to and from chip
> 3) setting up port splitters - split port into multiple ones and squash again,
>    enables usage of splitter cable
> 4) setting up shared buffers - shared among multiple ports within one chip
> 
> First patch of this set introduces a new generic Netlink based interface,
> called "devlink". It is similar to nl80211 model and it is heavily
> influenced by it, including the API definition. The devlink introduction patch
> implements use cases 1) and 2). Other 2 are in development atm and will
> be addressed by follow-ups.
> 
> It is very convenient for drivers to use devlink, as you can see in other
> patches in this set.
> 
> Counterpart for devlink is userspace tool called "dl". Command line interface
> and outputs are derived from "ip" tool so it should be easy for users to get
> used to it.
> 
> It is available here:
> https://github.com/jpirko/devlink

IHMO this very short command name "dl" is not an advantage.  It is
simply too unspecific and short for a good Google search.  E.g. when
searching for good examples for using "dl".  I think "devlink" would be
better.  If you like short commands do: alias dl="devlink"


> Example usage:
> butter:~$ dl help
> Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
> where  OBJECT := { dev | port | monitor }
>        OPTIONS := { -v/--verbose }
> 
> butter:~$ dl dev show
> 0: devlink0: bus pci dev 0000:01:00.0
> 
> butter:~$ dl port help
> Usage: dl port show [DEV/PORT_INDEX]
> Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]
> 
> butter:~$ dl port show
> devlink0/1: type ib ibdev mlx4_0
> devlink0/2: type ib ibdev mlx4_0
> 
> butter:~$ sudo dl port set devlink0/1 type eth
> 
> butter:~$ dl port show
> devlink0/1: type eth netdev ens4
> devlink0/2: type ib ibdev mlx4_0
> 
> butter:~$ sudo dl port set devlink0/1 type auto
> 
> butter:~$ dl port show
> devlink0/1: type eth(auto) netdev ens4
> devlink0/2: type ib ibdev mlx4_0

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-03 13:31 ` [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jesper Dangaard Brouer
@ 2016-02-03 13:33   ` Jiri Pirko
  2016-02-03 15:17     ` Daniel Borkmann
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2016-02-03 13:33 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, jbenc

Wed, Feb 03, 2016 at 02:31:33PM CET, brouer@redhat.com wrote:
>On Wed,  3 Feb 2016 11:47:56 +0100
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> There a is need for some userspace API that would allow to expose things
>> that are not directly related to any device class like net_device of
>> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
>> 
>> Use cases:
>> 1) get/set of port type (Ethernet/InfiniBand)
>> 2) monitoring of hardware messages to and from chip
>> 3) setting up port splitters - split port into multiple ones and squash again,
>>    enables usage of splitter cable
>> 4) setting up shared buffers - shared among multiple ports within one chip
>> 
>> First patch of this set introduces a new generic Netlink based interface,
>> called "devlink". It is similar to nl80211 model and it is heavily
>> influenced by it, including the API definition. The devlink introduction patch
>> implements use cases 1) and 2). Other 2 are in development atm and will
>> be addressed by follow-ups.
>> 
>> It is very convenient for drivers to use devlink, as you can see in other
>> patches in this set.
>> 
>> Counterpart for devlink is userspace tool called "dl". Command line interface
>> and outputs are derived from "ip" tool so it should be easy for users to get
>> used to it.
>> 
>> It is available here:
>> https://github.com/jpirko/devlink
>
>IHMO this very short command name "dl" is not an advantage.  It is
>simply too unspecific and short for a good Google search.  E.g. when
>searching for good examples for using "dl".  I think "devlink" would be
>better.  If you like short commands do: alias dl="devlink"

I was thinking about using "devlink". Decided to go with shortened
version so this is in-line with "ip". But you have a point.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-03 13:33   ` Jiri Pirko
@ 2016-02-03 15:17     ` Daniel Borkmann
  2016-02-04 13:22       ` Hannes Frederic Sowa
  2016-02-04 19:01       ` Rosen, Rami
  0 siblings, 2 replies; 26+ messages in thread
From: Daniel Borkmann @ 2016-02-03 15:17 UTC (permalink / raw)
  To: Jiri Pirko, Jesper Dangaard Brouer
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, jbenc

On 02/03/2016 02:33 PM, Jiri Pirko wrote:
> Wed, Feb 03, 2016 at 02:31:33PM CET, brouer@redhat.com wrote:
>> On Wed,  3 Feb 2016 11:47:56 +0100
>> Jiri Pirko <jiri@resnulli.us> wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
[...]
>>> It is available here:
>>> https://github.com/jpirko/devlink
>>
>> IHMO this very short command name "dl" is not an advantage.  It is
>> simply too unspecific and short for a good Google search.  E.g. when
>> searching for good examples for using "dl".  I think "devlink" would be
>> better.  If you like short commands do: alias dl="devlink"
>
> I was thinking about using "devlink". Decided to go with shortened
> version so this is in-line with "ip". But you have a point.

Btw, if you add this tool into iproute2 (which would be preferred), then
probably dl should be ok (and easier retrievable in that relation).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-03 15:17     ` Daniel Borkmann
@ 2016-02-04 13:22       ` Hannes Frederic Sowa
  2016-02-04 13:26         ` Jiri Pirko
  2016-02-04 19:01       ` Rosen, Rami
  1 sibling, 1 reply; 26+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-04 13:22 UTC (permalink / raw)
  To: Daniel Borkmann, Jiri Pirko, Jesper Dangaard Brouer
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, jbenc

On 03.02.2016 16:17, Daniel Borkmann wrote:
> On 02/03/2016 02:33 PM, Jiri Pirko wrote:
>> Wed, Feb 03, 2016 at 02:31:33PM CET, brouer@redhat.com wrote:
>>> On Wed,  3 Feb 2016 11:47:56 +0100
>>> Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
> [...]
>>>> It is available here:
>>>> https://github.com/jpirko/devlink
>>>
>>> IHMO this very short command name "dl" is not an advantage.  It is
>>> simply too unspecific and short for a good Google search.  E.g. when
>>> searching for good examples for using "dl".  I think "devlink" would be
>>> better.  If you like short commands do: alias dl="devlink"
>>
>> I was thinking about using "devlink". Decided to go with shortened
>> version so this is in-line with "ip". But you have a point.
>
> Btw, if you add this tool into iproute2 (which would be preferred), then
> probably dl should be ok (and easier retrievable in that relation).

This doesn't seem to be too much related to networking? Why can't 
something like this be in sysfs?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-04 13:22       ` Hannes Frederic Sowa
@ 2016-02-04 13:26         ` Jiri Pirko
  2016-02-05 10:01           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2016-02-04 13:26 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Jesper Dangaard Brouer, netdev, davem, idosch,
	eladr, yotamg, ogerlitz, yishaih, dledford, sean.hefty,
	hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

Thu, Feb 04, 2016 at 02:22:17PM CET, hannes@stressinduktion.org wrote:
>On 03.02.2016 16:17, Daniel Borkmann wrote:
>>On 02/03/2016 02:33 PM, Jiri Pirko wrote:
>>>Wed, Feb 03, 2016 at 02:31:33PM CET, brouer@redhat.com wrote:
>>>>On Wed,  3 Feb 2016 11:47:56 +0100
>>>>Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>From: Jiri Pirko <jiri@mellanox.com>
>>[...]
>>>>>It is available here:
>>>>>https://github.com/jpirko/devlink
>>>>
>>>>IHMO this very short command name "dl" is not an advantage.  It is
>>>>simply too unspecific and short for a good Google search.  E.g. when
>>>>searching for good examples for using "dl".  I think "devlink" would be
>>>>better.  If you like short commands do: alias dl="devlink"
>>>
>>>I was thinking about using "devlink". Decided to go with shortened
>>>version so this is in-line with "ip". But you have a point.
>>
>>Btw, if you add this tool into iproute2 (which would be preferred), then
>>probably dl should be ok (and easier retrievable in that relation).
>
>This doesn't seem to be too much related to networking? Why can't something
>like this be in sysfs?

It is related to networking quite bit. There has been couple of
iteration of this, including sysfs and configfs implementations. There
has been a consensus reached that this should be done by netlink. I
believe netlink is really the best for this purpose. Sysfs is not a good
idea.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-03 15:17     ` Daniel Borkmann
  2016-02-04 13:22       ` Hannes Frederic Sowa
@ 2016-02-04 19:01       ` Rosen, Rami
  2016-02-05 14:29         ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 26+ messages in thread
From: Rosen, Rami @ 2016-02-04 19:01 UTC (permalink / raw)
  To: Daniel Borkmann, Jiri Pirko, Jesper Dangaard Brouer
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, Hefty, Sean, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, Kirsher, Jeffrey T, jbenc, Rosen,
	Rami

Hi,

>Btw, if you add this tool into iproute2 (which would be preferred), >then
>probably dl should be ok (and easier retrievable in that relation).

+1.

This tool, which uses netlink messages, seems a natural candidate for iproute2.
 
And also apart from "ip", we have also another short command in iproute2, namely "tc", which is yet another justification for using the short form with "dl".

Regards,
Rami Rosen
Intel Corporation

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-04 13:26         ` Jiri Pirko
@ 2016-02-05 10:01           ` Hannes Frederic Sowa
  2016-02-05 17:38             ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-05 10:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Daniel Borkmann, Jesper Dangaard Brouer, netdev, davem, idosch,
	eladr, yotamg, ogerlitz, yishaih, dledford, sean.hefty,
	hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

On 04.02.2016 14:26, Jiri Pirko wrote:
> Thu, Feb 04, 2016 at 02:22:17PM CET, hannes@stressinduktion.org wrote:
>> On 03.02.2016 16:17, Daniel Borkmann wrote:
>>> On 02/03/2016 02:33 PM, Jiri Pirko wrote:
>>>> Wed, Feb 03, 2016 at 02:31:33PM CET, brouer@redhat.com wrote:
>>>>> On Wed,  3 Feb 2016 11:47:56 +0100
>>>>> Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>> [...]
>>>>>> It is available here:
>>>>>> https://github.com/jpirko/devlink
>>>>>
>>>>> IHMO this very short command name "dl" is not an advantage.  It is
>>>>> simply too unspecific and short for a good Google search.  E.g. when
>>>>> searching for good examples for using "dl".  I think "devlink" would be
>>>>> better.  If you like short commands do: alias dl="devlink"
>>>>
>>>> I was thinking about using "devlink". Decided to go with shortened
>>>> version so this is in-line with "ip". But you have a point.
>>>
>>> Btw, if you add this tool into iproute2 (which would be preferred), then
>>> probably dl should be ok (and easier retrievable in that relation).
>>
>> This doesn't seem to be too much related to networking? Why can't something
>> like this be in sysfs?
>
> It is related to networking quite bit. There has been couple of
> iteration of this, including sysfs and configfs implementations. There
> has been a consensus reached that this should be done by netlink. I
> believe netlink is really the best for this purpose. Sysfs is not a good
> idea.

Okay. I see it more as changing mode of operation of hardware and thus 
has not really anything to do with networking. If you say you change 
ethernet to infiniband it has something to do with networking, sure. But 
I am fine with this, I just thought the code size could be reduced by 
adding this to sysfs quite a lot. I don't have a strong opinion on this.

The attributes seem to reassemble sysfs quite a lot and at the same time 
implement a dependency on the ifindex which in a relational model is not 
really possible? Do infiniband devices have ifindexes? Names for 
infiniband and netdev are hardcoded? Shouldn't an interface to deliver 
hw msg be better in terms of streaming than using netlink?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-04 19:01       ` Rosen, Rami
@ 2016-02-05 14:29         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 26+ messages in thread
From: Jesper Dangaard Brouer @ 2016-02-05 14:29 UTC (permalink / raw)
  To: Rosen, Rami
  Cc: Daniel Borkmann, Jiri Pirko, netdev, davem, idosch, eladr,
	yotamg, ogerlitz, yishaih, dledford, Hefty, Sean, hal.rosenstock,
	eugenia, roopa, nikolay, hadarh, jhs, john.fastabend, Kirsher,
	Jeffrey T

On Thu, 4 Feb 2016 19:01:55 +0000
"Rosen, Rami" <rami.rosen@intel.com> wrote:

> Hi,
> 
> >Btw, if you add this tool into iproute2 (which would be preferred), >then
> >probably dl should be ok (and easier retrievable in that relation).  
> 
> +1.
> 
> This tool, which uses netlink messages, seems a natural candidate for iproute2.
>  
> And also apart from "ip", we have also another short command in
> iproute2, namely "tc", which is yet another justification for using
> the short form with "dl".

I agree that integration into iproute2 is a good idea.

I strongly disagree on introducing such a short command name. Today we
live in a world where you google for usage examples... the ip and tc
commands origin for another time, and IMHO their short names have hurt
them not getting broad usage.  Especially it have been difficult to get
people to really adopt "ip", which is also worst search term in the
Internet today...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-05 10:01           ` Hannes Frederic Sowa
@ 2016-02-05 17:38             ` Alexei Starovoitov
  2016-02-06 19:40               ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2016-02-05 17:38 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jiri Pirko, Daniel Borkmann, Jesper Dangaard Brouer, netdev,
	davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

On Fri, Feb 05, 2016 at 11:01:22AM +0100, Hannes Frederic Sowa wrote:
> 
> Okay. I see it more as changing mode of operation of hardware and thus has
> not really anything to do with networking. If you say you change ethernet to
> infiniband it has something to do with networking, sure. But I am fine with
> this, I just thought the code size could be reduced by adding this to sysfs
> quite a lot. I don't have a strong opinion on this.

there is already a way to change eth/ib via
echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1

sounds like this is another way to achieve the same?
Why not hide echo/cat in iproute2 instead of adding parallel netlink api?
Or this is for switches instead of nics?
Then why it's not adding to switchdev?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-05 17:38             ` Alexei Starovoitov
@ 2016-02-06 19:40               ` Jiri Pirko
  2016-02-08 10:15                 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2016-02-06 19:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Jesper Dangaard Brouer,
	netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, jbenc

Fri, Feb 05, 2016 at 06:38:42PM CET, alexei.starovoitov@gmail.com wrote:
>On Fri, Feb 05, 2016 at 11:01:22AM +0100, Hannes Frederic Sowa wrote:
>> 
>> Okay. I see it more as changing mode of operation of hardware and thus has
>> not really anything to do with networking. If you say you change ethernet to
>> infiniband it has something to do with networking, sure. But I am fine with
>> this, I just thought the code size could be reduced by adding this to sysfs
>> quite a lot. I don't have a strong opinion on this.
>
>there is already a way to change eth/ib via
>echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1
>
>sounds like this is another way to achieve the same?

It is. However the current way is driver-specific, not correct.
For mlx5, we need the same, it cannot be done in this way. Do devlink is
the correct way to go.


>Why not hide echo/cat in iproute2 instead of adding parallel netlink api?
>Or this is for switches instead of nics?
>Then why it's not adding to switchdev?

Note this is not specific to switch ASICs. This is for all network devices.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (6 preceding siblings ...)
  2016-02-03 13:31 ` [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jesper Dangaard Brouer
@ 2016-02-07 20:18 ` roopa
  2016-02-08 19:00   ` Doug Ledford
  7 siblings, 1 reply; 26+ messages in thread
From: roopa @ 2016-02-07 20:18 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, nikolay, hadarh,
	jhs, john.fastabend, jeffrey.t.kirsher, jbenc

On 2/3/16, 2:47 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> There a is need for some userspace API that would allow to expose things
> that are not directly related to any device class like net_device of
> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
>
> Use cases:
> 1) get/set of port type (Ethernet/InfiniBand)
> 2) monitoring of hardware messages to and from chip
> 3) setting up port splitters - split port into multiple ones and squash again,
>    enables usage of splitter cable
> 4) setting up shared buffers - shared among multiple ports within one chip
>
> First patch of this set introduces a new generic Netlink based interface,
> called "devlink". It is similar to nl80211 model and it is heavily
> influenced by it, including the API definition. The devlink introduction patch
> implements use cases 1) and 2). Other 2 are in development atm and will
> be addressed by follow-ups.
>
> It is very convenient for drivers to use devlink, as you can see in other
> patches in this set.
>
> Counterpart for devlink is userspace tool called "dl". Command line interface
> and outputs are derived from "ip" tool so it should be easy for users to get
> used to it.
>
> It is available here:
> https://github.com/jpirko/devlink

Jiri, thanks for this series!. Something like this is definitely needed for chip specific data. But i thought we were going to limit it to chip specific global attributes that cannot be set on a port.
>
> Example usage:
> butter:~$ dl help
> Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
> where  OBJECT := { dev | port | monitor }
>        OPTIONS := { -v/--verbose }
>
> butter:~$ dl dev show
> 0: devlink0: bus pci dev 0000:01:00.0
>
> butter:~$ dl port help
> Usage: dl port show [DEV/PORT_INDEX]
> Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]

I don't think we should include port specific attributes in this api. ports are netdevs and they should still continue to use 'ip link set'.

So, why not just introduce a rtnetlink ops abstraction for switch ports ?. Example:

static struct rtnl_link_ops swport_link_ops __read_mostly = {
        .kind           = 'swport',
        .setup          = swport_setup,
        .validate       = swport_validate,
};

IFLA_INFO_KIND = swport
IFLA_INFO_DATA = {
            IFLA_SWPORT_KIND = 'mlx'
            IFLA_SWPORT_TYPE, /* u16 */
            IFLA_SWPORT_DESIRED_TYPE, /* u16 */

            ... more ...
}

IFLA_INFO_DATA can be redirected to the specific switch driver at the switchdev ops/api layer.

ip link set dev swp1 swport_type eth swport_desired_type <blah>

Taking this one step further, the port splitter management can be done in user-space:
And, users/switch port management infra can use 'ip link add type swport ...' to create required switch ports.

Thanks,
Roopa



             

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-06 19:40               ` Jiri Pirko
@ 2016-02-08 10:15                 ` Hannes Frederic Sowa
  2016-02-08 10:55                   ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-08 10:15 UTC (permalink / raw)
  To: Jiri Pirko, Alexei Starovoitov
  Cc: Daniel Borkmann, Jesper Dangaard Brouer, netdev, davem, idosch,
	eladr, yotamg, ogerlitz, yishaih, dledford, sean.hefty,
	hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

Hello,

On 06.02.2016 20:40, Jiri Pirko wrote:
> Fri, Feb 05, 2016 at 06:38:42PM CET, alexei.starovoitov@gmail.com wrote:
>> On Fri, Feb 05, 2016 at 11:01:22AM +0100, Hannes Frederic Sowa wrote:
>>>
>>> Okay. I see it more as changing mode of operation of hardware and thus has
>>> not really anything to do with networking. If you say you change ethernet to
>>> infiniband it has something to do with networking, sure. But I am fine with
>>> this, I just thought the code size could be reduced by adding this to sysfs
>>> quite a lot. I don't have a strong opinion on this.
>>
>> there is already a way to change eth/ib via
>> echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1
>>
>> sounds like this is another way to achieve the same?
>
> It is. However the current way is driver-specific, not correct.

Why is driver specific not correct? Actually it is very much a device 
specific thing, isn't it?

> For mlx5, we need the same, it cannot be done in this way. Do devlink is
> the correct way to go.

Do two drivers already justify a new complete netlink api? Doesn't this 
create the same problems like netdevice naming problems which needed 
multiple years to become stable in case we have multiple cards or some 
administrator reorders the cards (biosdevorder, systemd/udev issues)? 
Are ports always stable? How can we have a 1:1 relationship with 
ifindexes and how can they be stable? It is impossible to use that in 
scripts?

>> Why not hide echo/cat in iproute2 instead of adding parallel netlink api?
>> Or this is for switches instead of nics?
>> Then why it's not adding to switchdev?
>
> Note this is not specific to switch ASICs. This is for all network devices.

That's actually my fear. The relationship from "devlink-names" to 
ifindexes I didn't understand at all architecturally.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-08 10:15                 ` Hannes Frederic Sowa
@ 2016-02-08 10:55                   ` Jiri Pirko
  2016-02-08 12:11                     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2016-02-08 10:55 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, jbenc

Mon, Feb 08, 2016 at 11:15:38AM CET, hannes@stressinduktion.org wrote:
>Hello,
>
>On 06.02.2016 20:40, Jiri Pirko wrote:
>>Fri, Feb 05, 2016 at 06:38:42PM CET, alexei.starovoitov@gmail.com wrote:
>>>On Fri, Feb 05, 2016 at 11:01:22AM +0100, Hannes Frederic Sowa wrote:
>>>>
>>>>Okay. I see it more as changing mode of operation of hardware and thus has
>>>>not really anything to do with networking. If you say you change ethernet to
>>>>infiniband it has something to do with networking, sure. But I am fine with
>>>>this, I just thought the code size could be reduced by adding this to sysfs
>>>>quite a lot. I don't have a strong opinion on this.
>>>
>>>there is already a way to change eth/ib via
>>>echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1
>>>
>>>sounds like this is another way to achieve the same?
>>
>>It is. However the current way is driver-specific, not correct.
>
>Why is driver specific not correct? Actually it is very much a device
>specific thing, isn't it?

Well, adding driver specific sysfs file called "driver_name_port_type"
does not seem correct to me.

>
>>For mlx5, we need the same, it cannot be done in this way. Do devlink is
>>the correct way to go.
>
>Do two drivers already justify a new complete netlink api? Doesn't this
>create the same problems like netdevice naming problems which needed multiple
>years to become stable in case we have multiple cards or some administrator

The thing is, other driver would use it as well, but there's no way to
do it :) So vendors have their proprietary configuration utils. Devlink
objective is to avoid those, to introduce vendor-neutral interface.


>reorders the cards (biosdevorder, systemd/udev issues)? Are ports always
>stable? How can we have a 1:1 relationship with ifindexes and how can they be
>stable? It is impossible to use that in scripts?

Port index is setup by driver always, they have stable internal
numbering. devlink device name is not stable (as for example netdev
name), but can be easily identified by bus name and device name. I don't
see a reason why udev cannot rename it according to some rules. By the
way, this is very similar to phyX wireless devices.


>
>>>Why not hide echo/cat in iproute2 instead of adding parallel netlink api?
>>>Or this is for switches instead of nics?
>>>Then why it's not adding to switchdev?
>>
>>Note this is not specific to switch ASICs. This is for all network devices.
>
>That's actually my fear. The relationship from "devlink-names" to ifindexes I
>didn't understand at all architecturally.

Again, this is very similar to phyX wireless devices.
I don't understand the reason for your fear :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-08 10:55                   ` Jiri Pirko
@ 2016-02-08 12:11                     ` Hannes Frederic Sowa
  0 siblings, 0 replies; 26+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-08 12:11 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, jbenc

Hi,

On 08.02.2016 11:55, Jiri Pirko wrote:
> Mon, Feb 08, 2016 at 11:15:38AM CET, hannes@stressinduktion.org wrote:
>> Hello,
>>
>> On 06.02.2016 20:40, Jiri Pirko wrote:
>>> Fri, Feb 05, 2016 at 06:38:42PM CET, alexei.starovoitov@gmail.com wrote:
>>>> On Fri, Feb 05, 2016 at 11:01:22AM +0100, Hannes Frederic Sowa wrote:
>>>>>
>>>>> Okay. I see it more as changing mode of operation of hardware and thus has
>>>>> not really anything to do with networking. If you say you change ethernet to
>>>>> infiniband it has something to do with networking, sure. But I am fine with
>>>>> this, I just thought the code size could be reduced by adding this to sysfs
>>>>> quite a lot. I don't have a strong opinion on this.
>>>>
>>>> there is already a way to change eth/ib via
>>>> echo 'eth' > /sys/bus/pci/drivers/mlx4_core/0000:02:00.0/mlx4_port1
>>>>
>>>> sounds like this is another way to achieve the same?
>>>
>>> It is. However the current way is driver-specific, not correct.
>>
>> Why is driver specific not correct? Actually it is very much a device
>> specific thing, isn't it?
>
> Well, adding driver specific sysfs file called "driver_name_port_type"
> does not seem correct to me.

Why? PHYs are debugged like that? I thought that especially sysfs is the 
right thing, it makes sure we can correctly identify a device. The logic 
in devlink_alloc by just incrementing a counter and having the naming 
policy be decided by driver registration time will introduce the same 
problems like identifying devices by interfaces had before.

>>> For mlx5, we need the same, it cannot be done in this way. Do devlink is
>>> the correct way to go.
>>
>> Do two drivers already justify a new complete netlink api? Doesn't this
>> create the same problems like netdevice naming problems which needed multiple
>> years to become stable in case we have multiple cards or some administrator
>
> The thing is, other driver would use it as well, but there's no way to
> do it :) So vendors have their proprietary configuration utils. Devlink
> objective is to avoid those, to introduce vendor-neutral interface.

Ok, agreed. But multiple driver reuse the phy-sysfs routines, too. I 
didn't see this to be a problem.

Anyway, I don't care if it is sysfs or something else, I am concerned 
about the atomic_inc_return based identification of those devices.

>> reorders the cards (biosdevorder, systemd/udev issues)? Are ports always
>> stable? How can we have a 1:1 relationship with ifindexes and how can they be
>> stable? It is impossible to use that in scripts?
>
> Port index is setup by driver always, they have stable internal
> numbering. devlink device name is not stable (as for example netdev
> name), but can be easily identified by bus name and device name. I don't
> see a reason why udev cannot rename it according to some rules. By the
> way, this is very similar to phyX wireless devices.

Ok, understood. It just seems to be duplication of code with another name.

>>>> Why not hide echo/cat in iproute2 instead of adding parallel netlink api?
>>>> Or this is for switches instead of nics?
>>>> Then why it's not adding to switchdev?
>>>
>>> Note this is not specific to switch ASICs. This is for all network devices.
>>
>> That's actually my fear. The relationship from "devlink-names" to ifindexes I
>> didn't understand at all architecturally.
>
> Again, this is very similar to phyX wireless devices.
> I don't understand the reason for your fear :)

If, as you said, this gets integrated by systemd/udev and will change 
names to stable ones before switching ports (so we don't accidentally 
switch a wrong port) I am all fine. This is basically how net_devices 
are handled.

Then my only argument is that this is too complex, but I can live with that.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it
  2016-02-07 20:18 ` roopa
@ 2016-02-08 19:00   ` Doug Ledford
  0 siblings, 0 replies; 26+ messages in thread
From: Doug Ledford @ 2016-02-08 19:00 UTC (permalink / raw)
  To: roopa, Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	sean.hefty, hal.rosenstock, eugenia, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

[-- Attachment #1: Type: text/plain, Size: 4777 bytes --]

On 02/07/2016 03:18 PM, roopa wrote:
> On 2/3/16, 2:47 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> There a is need for some userspace API that would allow to expose things
>> that are not directly related to any device class like net_device of
>> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
>>
>> Use cases:
>> 1) get/set of port type (Ethernet/InfiniBand)
>> 2) monitoring of hardware messages to and from chip
>> 3) setting up port splitters - split port into multiple ones and squash again,
>>    enables usage of splitter cable
>> 4) setting up shared buffers - shared among multiple ports within one chip
>>
>> First patch of this set introduces a new generic Netlink based interface,
>> called "devlink". It is similar to nl80211 model and it is heavily
>> influenced by it, including the API definition. The devlink introduction patch
>> implements use cases 1) and 2). Other 2 are in development atm and will
>> be addressed by follow-ups.
>>
>> It is very convenient for drivers to use devlink, as you can see in other
>> patches in this set.
>>
>> Counterpart for devlink is userspace tool called "dl". Command line interface
>> and outputs are derived from "ip" tool so it should be easy for users to get
>> used to it.
>>
>> It is available here:
>> https://github.com/jpirko/devlink
> 
> Jiri, thanks for this series!. Something like this is definitely needed for chip specific data. But i thought we were going to limit it to chip specific global attributes that cannot be set on a port.
>>
>> Example usage:
>> butter:~$ dl help
>> Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
>> where  OBJECT := { dev | port | monitor }
>>        OPTIONS := { -v/--verbose }
>>
>> butter:~$ dl dev show
>> 0: devlink0: bus pci dev 0000:01:00.0
>>
>> butter:~$ dl port help
>> Usage: dl port show [DEV/PORT_INDEX]
>> Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]
> 
> I don't think we should include port specific attributes in this api. ports are netdevs and they should still continue to use 'ip link set'.

That's not true.  InfiniBand ports don't have a netdev (unless you load
IPoIB, and then it's an emulated netdev, but the actual port itself has
none).  So as a general rule, you can never run ip link set on an
InfiniBand port.  You can run ip link set on the IPoIB device on that
port, but the features/bits you can set that way are specific to the
IPoIB device and not the parent IB device.

With that in mind, there has been discussions around the SRIOV settings
on InfiniBand ports.  If you have an Ethernet device, there are well
defined means of setting the parameters of the VFs via the PF.  Since an
InfiniBand device doesn't have a netdev, we are currently rolling our
own methods.  A couple solutions have been floated, neither one has been
coded up.  One was to use netlink directly on the InfiniBand PF to set
VF features, but that goes back to the problem that there is no netdev
for the IB port, so we would have to either register a fake dev or
something.  The other idea was to make IPoIB devices mandatory if you
have SRIOV IB devices, and then use the IPoIB device as the netdev to
configure and then add some new entry points for configuring IB devices
via an IPoIB netdev.  There are obvious differences, such as the VF GID
length would not come even close to being the same as the IPoIB dev's
netdev MAC length, possibly multiple P_Keys need configured on the VF
while the IPoIB dev has a single fixed P_Key per device, etc.  But the
whole making IPoIB mandatory and then using one class of PF device to
configure a different class of VF device is obviously just a really ugly
hack.

> So, why not just introduce a rtnetlink ops abstraction for switch ports ?. Example:
> 
> static struct rtnl_link_ops swport_link_ops __read_mostly = {
>         .kind           = 'swport',
>         .setup          = swport_setup,
>         .validate       = swport_validate,
> };
> 
> IFLA_INFO_KIND = swport
> IFLA_INFO_DATA = {
>             IFLA_SWPORT_KIND = 'mlx'
>             IFLA_SWPORT_TYPE, /* u16 */
>             IFLA_SWPORT_DESIRED_TYPE, /* u16 */
> 
>             ... more ...
> }
> 
> IFLA_INFO_DATA can be redirected to the specific switch driver at the switchdev ops/api layer.
> 
> ip link set dev swp1 swport_type eth swport_desired_type <blah>
> 
> Taking this one step further, the port splitter management can be done in user-space:
> And, users/switch port management infra can use 'ip link add type swport ...' to create required switch ports.
> 
> Thanks,
> Roopa
> 
> 
> 
>              
> 


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 1/6] Introduce devlink infrastructure
  2016-02-03 10:47 ` [patch net-next RFC 1/6] Introduce devlink infrastructure Jiri Pirko
@ 2016-02-11 14:31   ` Ivan Vecera
  2016-02-11 16:54     ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Ivan Vecera @ 2016-02-11 14:31 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

On 3.2.2016 11:47, Jiri Pirko wrote:
> +struct devlink_ops {
> +	size_t priv_size;
> +	int (*port_type_set)(struct devlink_port *devlink_port,
> +			     enum devlink_port_type port_type);
> +};
It does not make sense to have priv_size member here... If it is 
necessary it should be placed in struct devlink.

Ivan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 1/6] Introduce devlink infrastructure
  2016-02-11 14:31   ` Ivan Vecera
@ 2016-02-11 16:54     ` Jiri Pirko
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2016-02-11 16:54 UTC (permalink / raw)
  To: Ivan Vecera
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, jbenc

Thu, Feb 11, 2016 at 03:31:36PM CET, ivecera@redhat.com wrote:
>On 3.2.2016 11:47, Jiri Pirko wrote:
>>+struct devlink_ops {
>>+	size_t priv_size;
>>+	int (*port_type_set)(struct devlink_port *devlink_port,
>>+			     enum devlink_port_type port_type);
>>+};
>It does not make sense to have priv_size member here... If it is necessary it
>should be placed in struct devlink.

I forgot to remove this. Will do that. Thanks!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 4/6] mlx4: Implement devlink interface
  2016-02-03 10:48 ` [patch net-next RFC 4/6] mlx4: Implement devlink interface Jiri Pirko
@ 2016-02-16 16:43   ` Or Gerlitz
  2016-02-16 16:51     ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Or Gerlitz @ 2016-02-16 16:43 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

On 2/3/2016 12:48 PM, Jiri Pirko wrote:
> Implement newly introduced devlink interface. Add devlink port instances
> for every port and set the port types accordingly.
>

Lets see how to use the newly introduced interface in a way which would 
allow us to get a configuration
profile from user-space and program it to the firmware. This is generic 
problem for other drivers too - we've triedsolving it with configfs in 
the past but the patches were not accepted.

Or.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch net-next RFC 4/6] mlx4: Implement devlink interface
  2016-02-16 16:43   ` Or Gerlitz
@ 2016-02-16 16:51     ` Jiri Pirko
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Pirko @ 2016-02-16 16:51 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: netdev, davem, idosch, eladr, yotamg, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, jbenc

Tue, Feb 16, 2016 at 05:43:53PM CET, ogerlitz@mellanox.com wrote:
>On 2/3/2016 12:48 PM, Jiri Pirko wrote:
>>Implement newly introduced devlink interface. Add devlink port instances
>>for every port and set the port types accordingly.
>>
>
>Lets see how to use the newly introduced interface in a way which would allow
>us to get a configuration
>profile from user-space and program it to the firmware. This is generic
>problem for other drivers too - we've triedsolving it with configfs in the
>past but the patches were not accepted.

That we plan to do for mlxsw as well via devlink. That will be in a
follow-up patchset.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-02-16 16:51 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 10:47 [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jiri Pirko
2016-02-03 10:47 ` [patch net-next RFC 1/6] Introduce devlink infrastructure Jiri Pirko
2016-02-11 14:31   ` Ivan Vecera
2016-02-11 16:54     ` Jiri Pirko
2016-02-03 10:47 ` [patch net-next RFC 2/6] mlxsw: Implement devlink interface Jiri Pirko
2016-02-03 10:47 ` [patch net-next RFC 3/6] mlxsw: Implement hardware messages notification using devlink Jiri Pirko
2016-02-03 10:48 ` [patch net-next RFC 4/6] mlx4: Implement devlink interface Jiri Pirko
2016-02-16 16:43   ` Or Gerlitz
2016-02-16 16:51     ` Jiri Pirko
2016-02-03 10:48 ` [patch net-next RFC 5/6] mlx4: Implement hardware messages notification using devlink Jiri Pirko
2016-02-03 10:48 ` [patch net-next RFC 6/6] mlx4: Implement port type setting via devlink interface Jiri Pirko
2016-02-03 13:31 ` [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it Jesper Dangaard Brouer
2016-02-03 13:33   ` Jiri Pirko
2016-02-03 15:17     ` Daniel Borkmann
2016-02-04 13:22       ` Hannes Frederic Sowa
2016-02-04 13:26         ` Jiri Pirko
2016-02-05 10:01           ` Hannes Frederic Sowa
2016-02-05 17:38             ` Alexei Starovoitov
2016-02-06 19:40               ` Jiri Pirko
2016-02-08 10:15                 ` Hannes Frederic Sowa
2016-02-08 10:55                   ` Jiri Pirko
2016-02-08 12:11                     ` Hannes Frederic Sowa
2016-02-04 19:01       ` Rosen, Rami
2016-02-05 14:29         ` Jesper Dangaard Brouer
2016-02-07 20:18 ` roopa
2016-02-08 19:00   ` Doug Ledford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.