All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next 0/9] Introduce devlink interface and first drivers to use it
@ 2016-02-22 18:31 Jiri Pirko
  2016-02-22 18:31 ` [patch net-next 1/9] Introduce devlink infrastructure Jiri Pirko
                   ` (9 more replies)
  0 siblings, 10 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Jiri Pirko <jiri@mellanox.com>

There a is need for some userspace API that would allow to expose things
that are not directly related to any device class like net_device of
ib_device, but rather chip-wide/switch-ASIC-wide stuff.

Use cases:
1) get/set of port type (Ethernet/InfiniBand)
2) setting up port splitters - split port into multiple ones and squash again,
   enables usage of splitter cable
3) setting up shared buffers - shared among multiple ports within
   one chip (work in progress)
4) configuration of switch wide properties - resources division etc - This will
   allow to pass configuration that is unacceptable to be passed as
   a module option.

First patch of this set introduces a new generic Netlink based interface,
called "devlink". It is similar to nl80211 model and it is heavily
influenced by it, including the API definition. The devlink introduction patch
implements use cases 1) and 2). Other 2 are in development atm and will
be addressed by follow-ups.

It is very convenient for drivers to use devlink, as you can see in other
patches in this set.

Counterpart for devlink is userspace tool for now called "dl". Command line
interface and outputs are derived from "ip" tool so it should be easy
for users to get used to it.

It is available here as a standalone tool for now:
https://github.com/jpirko/devlink
After this is merge in kernel, I will include the "dl" or "devlink" tool
into iproute2 toolset.

Port type setting example:
	myhost:~$ dl help
	Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
	where  OBJECT := { dev | port | monitor }
	       OPTIONS := { -v/--verbose }

	myhost:~$ dl dev help
	Usage: dl dev show [DEV]
	Usage: dl dev set DEV [ name NEWNAME ]
	
	myhost:~$ dl dev show
	0: devlink0: bus pci dev 0000:01:00.0
	
	myhost:~$ dl port help
	Usage: dl port show [DEV/PORT_INDEX]
	Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]
	Usage: dl port split DEV/PORT_INDEX count
	Usage: dl port unsplit DEV/PORT_INDEX
	
	myhost:~$ dl port show
	devlink0/1: type ib ibdev mlx4_0
	devlink0/2: type ib ibdev mlx4_0
	
	myhost:~$ sudo dl port set devlink0/1 type eth
	
	myhost:~$ dl port show
	devlink0/1: type eth netdev ens4
	devlink0/2: type ib ibdev mlx4_0
	
	myhost:~$ sudo dl port set devlink0/2 type auto
	
	myhost:~$ dl port show
	devlink0/1: type eth netdev ens4
	devlink0/2: type ib(auto) ibdev mlx4_0

Port splitting example:
	myswitch:~$ dl port
	devlink0/1: type eth netdev eth0
	devlink0/3: type eth netdev eth1
	devlink0/5: type eth netdev eth2
	...
	devlink0/63: type eth netdev eth31
	
	myswitch:~$ sudo dl port split devlink0/1 2
	
	myswitch:~$ dl port
	devlink0/3: type eth netdev eth1
	devlink0/5: type eth netdev eth2
	...
	devlink0/63: type eth netdev eth31
	devlink0/1: type eth netdev eth0 split_group 16
	devlink0/2: type eth netdev eth32 split_group 16
	
	myswitch:~$ sudo dl port unsplit devlink0/1
	
	myswitch:~$ dl port
	devlink0/3: type eth netdev eth1
	devlink0/5: type eth netdev eth2
	...
	devlink0/63: type eth netdev eth31
	devlink0/1: type eth netdev eth0

Ido Schimmel (4):
  mlxsw: spectrum: Unmap local port from module during teardown
  mlxsw: spectrum: Store local port to module mapping during init
  mlxsw: spectrum: Mark unused ports using NULL
  mlxsw: spectrum: Introduce port splitting

Jiri Pirko (5):
  Introduce devlink infrastructure
  mlx4: Implement devlink interface
  mlx4: Implement port type setting via devlink interface
  mlxsw: Implement devlink interface
  mlxsw: core: Add devlink port splitter callbacks

 MAINTAINERS                                    |   8 +
 drivers/infiniband/hw/mlx4/main.c              |   7 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   8 +-
 drivers/net/ethernet/mellanox/mlx4/intf.c      |   9 +
 drivers/net/ethernet/mellanox/mlx4/main.c      | 129 +++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h      |   2 +
 drivers/net/ethernet/mellanox/mlxsw/core.c     |  56 +-
 drivers/net/ethernet/mellanox/mlxsw/core.h     |   2 +
 drivers/net/ethernet/mellanox/mlxsw/port.h     |   2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 238 ++++++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   8 +-
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c |  20 +
 include/linux/mlx4/driver.h                    |   3 +
 include/net/devlink.h                          | 156 +++++
 include/uapi/linux/devlink.h                   |  73 ++
 net/Kconfig                                    |   7 +
 net/core/Makefile                              |   1 +
 net/core/devlink.c                             | 887 +++++++++++++++++++++++++
 18 files changed, 1557 insertions(+), 59 deletions(-)
 create mode 100644 include/net/devlink.h
 create mode 100644 include/uapi/linux/devlink.h
 create mode 100644 net/core/devlink.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch net-next 1/9] Introduce devlink infrastructure
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
@ 2016-02-22 18:31 ` Jiri Pirko
  2016-02-22 22:29   ` roopa
  2016-02-24  7:02   ` Yuval Mintz
  2016-02-22 18:31 ` [patch net-next 2/9] mlx4: Implement devlink interface Jiri Pirko
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Jiri Pirko <jiri@mellanox.com>

Introduce devlink infrastructure for drivers to register and expose to
userspace via generic Netlink interface.

There are two basic objects defined:
devlink - one instance for every "parent device", for example switch ASIC
devlink port - one instance for every physical port of the device.

This initial portion implements basic get/dump of objects to userspace.
Also, port splitter and port type setting is implemented.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 MAINTAINERS                  |   8 +
 include/net/devlink.h        | 156 ++++++++
 include/uapi/linux/devlink.h |  73 ++++
 net/Kconfig                  |   7 +
 net/core/Makefile            |   1 +
 net/core/devlink.c           | 887 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1132 insertions(+)
 create mode 100644 include/net/devlink.h
 create mode 100644 include/uapi/linux/devlink.h
 create mode 100644 net/core/devlink.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 355e1c8..caad3e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3502,6 +3502,14 @@ F:	include/linux/device-mapper.h
 F:	include/linux/dm-*.h
 F:	include/uapi/linux/dm-*.h
 
+DEVLINK
+M:	Jiri Pirko <jiri@mellanox.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	net/core/devlink.c
+F:	include/net/devlink.h
+F:	include/uapi/linux/devlink.h
+
 DIALOG SEMICONDUCTOR DRIVERS
 M:	Support Opensource <support.opensource@diasemi.com>
 W:	http://www.dialog-semiconductor.com/products
diff --git a/include/net/devlink.h b/include/net/devlink.h
new file mode 100644
index 0000000..ad34754
--- /dev/null
+++ b/include/net/devlink.h
@@ -0,0 +1,156 @@
+/*
+ * include/net/devlink.h - Network physical device Netlink interface
+ * Copyright (c) 2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _NET_DEVLINK_H_
+#define _NET_DEVLINK_H_
+
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <net/net_namespace.h>
+#include <uapi/linux/devlink.h>
+
+struct devlink_ops;
+
+struct devlink {
+	struct list_head list;
+	struct list_head port_list;
+	int index;
+	const struct devlink_ops *ops;
+	struct device dev;
+	possible_net_t _net;
+	char priv[0] __aligned(NETDEV_ALIGN);
+};
+
+struct devlink_port {
+	struct list_head list;
+	struct devlink *devlink;
+	unsigned index;
+	bool registered;
+	enum devlink_port_type type;
+	enum devlink_port_type desired_type;
+	void *type_dev;
+	bool split;
+	u32 split_group;
+};
+
+struct devlink_ops {
+	size_t priv_size;
+	int (*port_type_set)(struct devlink_port *devlink_port,
+			     enum devlink_port_type port_type);
+	int (*port_split)(struct devlink *devlink, unsigned int port_index,
+			  unsigned int count);
+	int (*port_unsplit)(struct devlink *devlink, unsigned int port_index);
+};
+
+static inline void *devlink_priv(struct devlink *devlink)
+{
+	BUG_ON(!devlink);
+	return &devlink->priv;
+}
+
+static inline struct devlink *priv_to_devlink(void *priv)
+{
+	BUG_ON(!priv);
+	return container_of(priv, struct devlink, priv);
+}
+
+static inline struct device *devlink_dev(struct devlink *devlink)
+{
+	return &devlink->dev;
+}
+
+static inline void set_devlink_dev(struct devlink *devlink, struct device *dev)
+{
+	devlink->dev.parent = dev;
+}
+
+static inline const char *devlink_name(const struct devlink *devlink)
+{
+	return dev_name(&devlink->dev);
+}
+
+struct ib_device;
+
+#if IS_ENABLED(CONFIG_NET_DEVLINK)
+
+struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
+int devlink_register(struct devlink *devlink);
+void devlink_unregister(struct devlink *devlink);
+void devlink_free(struct devlink *devlink);
+int devlink_port_register(struct devlink *devlink,
+			  struct devlink_port *devlink_port,
+			  unsigned int port_index);
+void devlink_port_unregister(struct devlink_port *devlink_port);
+void devlink_port_type_eth_set(struct devlink_port *devlink_port,
+			       struct net_device *netdev);
+void devlink_port_type_ib_set(struct devlink_port *devlink_port,
+			      struct ib_device *ibdev);
+void devlink_port_type_clear(struct devlink_port *devlink_port);
+void devlink_port_split_set(struct devlink_port *devlink_port,
+			    u32 split_group);
+
+#else
+
+static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
+					    size_t priv_size)
+{
+	return kzalloc(sizeof(struct devlink) + priv_size, GFP_KERNEL);
+}
+
+static inline int devlink_register(struct devlink *devlink)
+{
+	return 0;
+}
+
+static inline void devlink_unregister(struct devlink *devlink)
+{
+}
+
+static inline void devlink_free(struct devlink *devlink)
+{
+	kfree(devlink);
+}
+
+static inline int devlink_port_register(struct devlink *devlink,
+					struct devlink_port *devlink_port,
+					unsigned int port_index)
+{
+	return 0;
+}
+
+static inline void devlink_port_unregister(struct devlink_port *devlink_port)
+{
+}
+
+static inline void devlink_port_type_eth_set(struct devlink_port *devlink_port,
+					     struct net_device *netdev)
+{
+}
+
+static inline void devlink_port_type_ib_set(struct devlink_port *devlink_port,
+					    struct ib_device *ibdev)
+{
+}
+
+static inline void devlink_port_type_clear(struct devlink_port *devlink_port)
+{
+}
+
+static inline void devlink_port_split_set(struct devlink_port *devlink_port,
+					  u32 split_group)
+{
+}
+
+#endif
+
+#endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
new file mode 100644
index 0000000..04e0376
--- /dev/null
+++ b/include/uapi/linux/devlink.h
@@ -0,0 +1,73 @@
+/*
+ * include/uapi/linux/devlink.h - Network physical device Netlink interface
+ * Copyright (c) 2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_DEVLINK_H_
+#define _UAPI_LINUX_DEVLINK_H_
+
+#define DEVLINK_GENL_NAME "devlink"
+#define DEVLINK_GENL_VERSION 0x1
+#define DEVLINK_GENL_MCGRP_CONFIG_NAME "config"
+
+enum devlink_command {
+	/* don't change the order or add anything between, this is ABI! */
+	DEVLINK_CMD_UNSPEC,
+
+	DEVLINK_CMD_GET,		/* can dump */
+	DEVLINK_CMD_SET,
+	DEVLINK_CMD_NEW,
+	DEVLINK_CMD_DEL,
+
+	DEVLINK_CMD_PORT_GET,		/* can dump */
+	DEVLINK_CMD_PORT_SET,
+	DEVLINK_CMD_PORT_NEW,
+	DEVLINK_CMD_PORT_DEL,
+
+	DEVLINK_CMD_PORT_SPLIT,
+	DEVLINK_CMD_PORT_UNSPLIT,
+
+	/* add new commands above here */
+
+	__DEVLINK_CMD_MAX,
+	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
+};
+
+enum devlink_port_type {
+	DEVLINK_PORT_TYPE_NOTSET,
+	DEVLINK_PORT_TYPE_AUTO,
+	DEVLINK_PORT_TYPE_ETH,
+	DEVLINK_PORT_TYPE_IB,
+};
+
+enum devlink_attr {
+	/* don't change the order or add anything between, this is ABI! */
+	DEVLINK_ATTR_UNSPEC,
+
+	DEVLINK_ATTR_INDEX,			/* u32 */
+#define DEVLINK_ATTR_NAME_MAX_LEN 32
+	DEVLINK_ATTR_NAME,			/* string */
+	DEVLINK_ATTR_BUS_NAME,			/* string */
+	DEVLINK_ATTR_DEV_NAME,			/* string */
+	DEVLINK_ATTR_PORT_INDEX,		/* u32 */
+	DEVLINK_ATTR_PORT_TYPE,			/* u16 */
+	DEVLINK_ATTR_PORT_DESIRED_TYPE,		/* u16 */
+	DEVLINK_ATTR_PORT_NETDEV_IFINDEX,	/* u32 */
+	DEVLINK_ATTR_PORT_NETDEV_NAME,		/* string */
+	DEVLINK_ATTR_PORT_IBDEV_NAME,		/* string */
+	DEVLINK_ATTR_PORT_SPLIT_COUNT,		/* u32 */
+	DEVLINK_ATTR_PORT_SPLIT_GROUP,		/* u32 */
+
+	/* add new attributes above here, update the policy in devlink.c */
+
+	__DEVLINK_ATTR_MAX,
+	DEVLINK_ATTR_MAX = __DEVLINK_ATTR_MAX - 1
+};
+
+#endif /* _UAPI_LINUX_DEVLINK_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index b80efec..6c9cfb0 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -396,6 +396,13 @@ config DST_CACHE
 	bool "dst cache"
 	default n
 
+config NET_DEVLINK
+	tristate "Network physical/parent device Netlink interface"
+	help
+	  Network physical/parent device Netlink interface provides
+	  infrastructure to support access to physical chip-wide config and
+	  monitoring.
+
 endif   # if NET
 
 # Used by archs to tell that they support BPF_JIT
diff --git a/net/core/Makefile b/net/core/Makefile
index 7a8fb8a..014422e 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -25,3 +25,4 @@ obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
 obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
+obj-$(CONFIG_NET_DEVLINK) += devlink.o
diff --git a/net/core/devlink.c b/net/core/devlink.c
new file mode 100644
index 0000000..d32c5fc
--- /dev/null
+++ b/net/core/devlink.c
@@ -0,0 +1,887 @@
+/*
+ * net/core/devlink.c - Network physical/parent device Netlink interface
+ *
+ * Heavily inspired by net/wireless/
+ * Copyright (c) 2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/gfp.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <linux/sysfs.h>
+#include <linux/bitops.h>
+#include <rdma/ib_verbs.h>
+#include <net/netlink.h>
+#include <net/genetlink.h>
+#include <net/rtnetlink.h>
+#include <net/net_namespace.h>
+#include <net/sock.h>
+#include <net/devlink.h>
+
+static LIST_HEAD(devlink_list);
+
+/* devlink_mutex
+ *
+ * An overall lock guarding every operation comming from userspace.
+ * If also guards devlink devices list and it is taken when
+ * driver registers/unregisters it.
+ */
+static DEFINE_MUTEX(devlink_mutex);
+
+/* devlink_port_mutex
+ *
+ * Shared lock to guard lists of ports in all devlink devices.
+ */
+static DEFINE_MUTEX(devlink_port_mutex);
+
+static struct net *devlink_net(const struct devlink *devlink)
+{
+	return read_pnet(&devlink->_net);
+}
+
+static void devlink_net_set(struct devlink *devlink, struct net *net)
+{
+	write_pnet(&devlink->_net, net);
+}
+
+static bool devlink_name_exists(struct net *net, const char *name)
+{
+	struct devlink *devlink;
+
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (strcmp(devlink_name(devlink), name) == 0 &&
+		    net_eq(devlink_net(devlink), net))
+			return true;
+	}
+	return false;
+}
+
+static struct devlink *devlink_get_by_index(struct net *net, int index)
+{
+	struct devlink *devlink;
+
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (devlink->index == index &&
+		    net_eq(devlink_net(devlink), net))
+			return devlink;
+	}
+	return NULL;
+}
+
+static struct devlink *devlink_get_from_attrs(struct net *net,
+					      struct nlattr **attrs)
+{
+	if (attrs[DEVLINK_ATTR_INDEX]) {
+		u32 index = nla_get_u32(attrs[DEVLINK_ATTR_INDEX]);
+		struct devlink *devlink;
+
+		devlink = devlink_get_by_index(net, index);
+		if (!devlink)
+			return ERR_PTR(-ENODEV);
+		return devlink;
+	}
+	return ERR_PTR(-EINVAL);
+}
+
+static struct devlink *devlink_get_from_info(struct genl_info *info)
+{
+	return devlink_get_from_attrs(genl_info_net(info), info->attrs);
+}
+
+static struct devlink_port *devlink_port_get_by_index(struct devlink *devlink,
+						      int port_index)
+{
+	struct devlink_port *devlink_port;
+
+	list_for_each_entry(devlink_port, &devlink->port_list, list) {
+		if (devlink_port->index == port_index)
+			return devlink_port;
+	}
+	return NULL;
+}
+
+static bool devlink_port_index_exists(struct devlink *devlink, int port_index)
+{
+	return devlink_port_get_by_index(devlink, port_index);
+}
+
+static struct devlink_port *devlink_port_get_from_attrs(struct devlink *devlink,
+							struct nlattr **attrs)
+{
+	if (attrs[DEVLINK_ATTR_PORT_INDEX]) {
+		u32 port_index = nla_get_u32(attrs[DEVLINK_ATTR_PORT_INDEX]);
+		struct devlink_port *devlink_port;
+
+		devlink_port = devlink_port_get_by_index(devlink, port_index);
+		if (!devlink_port)
+			return ERR_PTR(-ENODEV);
+		return devlink_port;
+	}
+	return ERR_PTR(-EINVAL);
+}
+
+static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink,
+						       struct genl_info *info)
+{
+	return devlink_port_get_from_attrs(devlink, info->attrs);
+}
+
+#define DEVLINK_NL_FLAG_NEED_PORT	BIT(0)
+
+static int devlink_nl_pre_doit(const struct genl_ops *ops,
+			       struct sk_buff *skb, struct genl_info *info)
+{
+	struct devlink *devlink;
+
+	mutex_lock(&devlink_mutex);
+	devlink = devlink_get_from_info(info);
+	if (IS_ERR(devlink)) {
+		mutex_unlock(&devlink_mutex);
+		return PTR_ERR(devlink);
+	}
+	info->user_ptr[0] = devlink;
+	if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
+		struct devlink_port *devlink_port;
+
+		mutex_lock(&devlink_port_mutex);
+		devlink_port = devlink_port_get_from_info(devlink, info);
+		if (IS_ERR(devlink_port)) {
+			mutex_unlock(&devlink_port_mutex);
+			mutex_unlock(&devlink_mutex);
+			return PTR_ERR(devlink_port);
+		}
+		info->user_ptr[1] = devlink_port;
+	}
+	return 0;
+}
+
+static void devlink_nl_post_doit(const struct genl_ops *ops,
+				 struct sk_buff *skb, struct genl_info *info)
+{
+	if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT)
+		mutex_unlock(&devlink_port_mutex);
+	mutex_unlock(&devlink_mutex);
+}
+
+static struct genl_family devlink_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= DEVLINK_GENL_NAME,
+	.version	= DEVLINK_GENL_VERSION,
+	.maxattr	= DEVLINK_ATTR_MAX,
+	.netnsok	= true,
+	.pre_doit	= devlink_nl_pre_doit,
+	.post_doit	= devlink_nl_post_doit,
+};
+
+enum devlink_multicast_groups {
+	DEVLINK_MCGRP_CONFIG,
+};
+
+static const struct genl_multicast_group devlink_nl_mcgrps[] = {
+	[DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME },
+};
+
+static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
+			   enum devlink_command cmd, u32 portid,
+			   u32 seq, int flags)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(msg, DEVLINK_ATTR_INDEX, devlink->index))
+		goto nla_put_failure;
+	if (nla_put_string(msg, DEVLINK_ATTR_NAME, devlink_name(devlink)))
+		goto nla_put_failure;
+
+	if (devlink->dev.parent) {
+		struct device *dev = devlink->dev.parent;
+
+		if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, dev->bus->name))
+			goto nla_put_failure;
+		if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, dev_name(dev)))
+			goto nla_put_failure;
+	}
+
+	genlmsg_end(msg, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static void devlink_notify(struct devlink *devlink, enum devlink_command cmd)
+{
+	struct sk_buff *msg;
+	int err;
+
+	WARN_ON(cmd != DEVLINK_CMD_NEW && cmd != DEVLINK_CMD_DEL);
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return;
+
+	err = devlink_nl_fill(msg, devlink, cmd, 0, 0, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+}
+
+static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
+				struct devlink_port *devlink_port,
+				enum devlink_command cmd, u32 portid,
+				u32 seq, int flags)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(msg, DEVLINK_ATTR_INDEX, devlink->index))
+		goto nla_put_failure;
+	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index))
+		goto nla_put_failure;
+	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_TYPE, devlink_port->type))
+		goto nla_put_failure;
+	if (devlink_port->desired_type != DEVLINK_PORT_TYPE_NOTSET &&
+	    nla_put_u16(msg, DEVLINK_ATTR_PORT_DESIRED_TYPE,
+			devlink_port->desired_type))
+		goto nla_put_failure;
+	if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) {
+		struct net_device *netdev = devlink_port->type_dev;
+
+		if (netdev &&
+		    (nla_put_u32(msg, DEVLINK_ATTR_PORT_NETDEV_IFINDEX,
+				 netdev->ifindex) ||
+		     nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME,
+				    netdev->name)))
+			goto nla_put_failure;
+	}
+	if (devlink_port->type == DEVLINK_PORT_TYPE_IB) {
+		struct ib_device *ibdev = devlink_port->type_dev;
+
+		if (ibdev &&
+		    nla_put_string(msg, DEVLINK_ATTR_PORT_IBDEV_NAME,
+				   ibdev->name))
+			goto nla_put_failure;
+	}
+	if (devlink_port->split &&
+	    nla_put_u32(msg, DEVLINK_ATTR_PORT_SPLIT_GROUP,
+			devlink_port->split_group))
+		goto nla_put_failure;
+
+	genlmsg_end(msg, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static void devlink_port_notify(struct devlink_port *devlink_port,
+				enum devlink_command cmd)
+{
+	struct devlink *devlink = devlink_port->devlink;
+	struct sk_buff *msg;
+	int err;
+
+	if (!devlink_port->registered)
+		return;
+
+	WARN_ON(cmd != DEVLINK_CMD_PORT_NEW && cmd != DEVLINK_CMD_PORT_DEL);
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return;
+
+	err = devlink_nl_port_fill(msg, devlink, devlink_port, cmd, 0, 0, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+}
+
+static int devlink_nl_cmd_get_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct sk_buff *msg;
+	int err;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
+			      info->snd_portid, info->snd_seq, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg,
+				     struct netlink_callback *cb)
+{
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		if (idx < start) {
+			idx++;
+			continue;
+		}
+		err = devlink_nl_fill(msg, devlink, DEVLINK_CMD_NEW,
+				      NETLINK_CB(cb->skb).portid,
+				      cb->nlh->nlmsg_seq, NLM_F_MULTI);
+		if (err)
+			goto out;
+		idx++;
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
+static int devlink_rename(struct devlink *devlink, const char *newname)
+{
+	int err;
+
+	if (strcmp(newname, devlink_name(devlink)) == 0)
+		return 0;
+	if (devlink_name_exists(devlink_net(devlink), newname))
+		return -EINVAL;
+	if (!dev_valid_name(newname))
+		return -EINVAL;
+	err = device_rename(&devlink->dev, newname);
+	if (err)
+		return err;
+	devlink_notify(devlink, DEVLINK_CMD_NEW);
+	return 0;
+}
+
+static int devlink_nl_cmd_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	int err;
+
+	if (info->attrs[DEVLINK_ATTR_NAME]) {
+		err = devlink_rename(devlink,
+				     nla_data(info->attrs[DEVLINK_ATTR_NAME]));
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_port *devlink_port = info->user_ptr[1];
+	struct sk_buff *msg;
+	int err;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_port_fill(msg, devlink, devlink_port,
+				   DEVLINK_CMD_PORT_NEW,
+				   info->snd_portid, info->snd_seq, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
+					  struct netlink_callback *cb)
+{
+	struct devlink *devlink;
+	struct devlink_port *devlink_port;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	mutex_lock(&devlink_port_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		list_for_each_entry(devlink_port, &devlink->port_list, list) {
+			if (idx < start) {
+				idx++;
+				continue;
+			}
+			err = devlink_nl_port_fill(msg, devlink, devlink_port,
+						   DEVLINK_CMD_NEW,
+						   NETLINK_CB(cb->skb).portid,
+						   cb->nlh->nlmsg_seq,
+						   NLM_F_MULTI);
+			if (err)
+				goto out;
+			idx++;
+		}
+	}
+out:
+	mutex_unlock(&devlink_port_mutex);
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
+static int devlink_port_type_set(struct devlink *devlink,
+				 struct devlink_port *devlink_port,
+				 enum devlink_port_type port_type)
+
+{
+	int err;
+
+	if (devlink->ops && devlink->ops->port_type_set) {
+		if (port_type == DEVLINK_PORT_TYPE_NOTSET)
+			return -EINVAL;
+		err = devlink->ops->port_type_set(devlink_port, port_type);
+		if (err)
+			return err;
+		devlink_port->desired_type = port_type;
+		devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+		return 0;
+	}
+	return -EOPNOTSUPP;
+}
+
+static int devlink_nl_cmd_port_set_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_port *devlink_port = info->user_ptr[1];
+	int err;
+
+	if (info->attrs[DEVLINK_ATTR_PORT_TYPE]) {
+		enum devlink_port_type port_type;
+
+		port_type = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_TYPE]);
+		err = devlink_port_type_set(devlink, devlink_port, port_type);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int devlink_port_split(struct devlink *devlink,
+			      u32 port_index, u32 count)
+
+{
+	if (devlink->ops && devlink->ops->port_split)
+		return devlink->ops->port_split(devlink, port_index, count);
+	return -EOPNOTSUPP;
+}
+
+static int devlink_nl_cmd_port_split_doit(struct sk_buff *skb,
+					  struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	u32 port_index;
+	u32 count;
+
+	if (!info->attrs[DEVLINK_ATTR_PORT_INDEX] ||
+	    !info->attrs[DEVLINK_ATTR_PORT_SPLIT_COUNT])
+		return -EINVAL;
+
+	port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]);
+	count = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_SPLIT_COUNT]);
+	return devlink_port_split(devlink, port_index, count);
+}
+
+static int devlink_port_unsplit(struct devlink *devlink, u32 port_index)
+
+{
+	if (devlink->ops && devlink->ops->port_unsplit)
+		return devlink->ops->port_unsplit(devlink, port_index);
+	return -EOPNOTSUPP;
+}
+
+static int devlink_nl_cmd_port_unsplit_doit(struct sk_buff *skb,
+					    struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	u32 port_index;
+
+	if (!info->attrs[DEVLINK_ATTR_PORT_INDEX])
+		return -EINVAL;
+
+	port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]);
+	return devlink_port_unsplit(devlink, port_index);
+}
+
+static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
+	[DEVLINK_ATTR_INDEX] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_NAME] = { .type = NLA_NUL_STRING,
+				.len = DEVLINK_ATTR_NAME_MAX_LEN },
+	[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_PORT_TYPE] = { .type = NLA_U16 },
+	[DEVLINK_ATTR_PORT_SPLIT_COUNT] = { .type = NLA_U32 },
+};
+
+static const struct genl_ops devlink_nl_ops[] = {
+	{
+		.cmd = DEVLINK_CMD_GET,
+		.doit = devlink_nl_cmd_get_doit,
+		.dumpit = devlink_nl_cmd_get_dumpit,
+		.policy = devlink_nl_policy,
+		/* can be retrieved by unprivileged users */
+	},
+	{
+		.cmd = DEVLINK_CMD_SET,
+		.doit = devlink_nl_cmd_set_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = DEVLINK_CMD_PORT_GET,
+		.doit = devlink_nl_cmd_port_get_doit,
+		.dumpit = devlink_nl_cmd_port_get_dumpit,
+		.policy = devlink_nl_policy,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_PORT,
+		/* can be retrieved by unprivileged users */
+	},
+	{
+		.cmd = DEVLINK_CMD_PORT_SET,
+		.doit = devlink_nl_cmd_port_set_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_PORT,
+	},
+	{
+		.cmd = DEVLINK_CMD_PORT_SPLIT,
+		.doit = devlink_nl_cmd_port_split_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = DEVLINK_CMD_PORT_UNSPLIT,
+		.doit = devlink_nl_cmd_port_unsplit_doit,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+static struct class devlink_class;
+
+/**
+ *	devlink_alloc - Allocate new devlink instance resources
+ *
+ *	@ops: ops
+ *	@priv_size: size of user private data
+ *
+ *	Allocate new devlink instance resources, including devlink index
+ *	and name.
+ */
+struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
+{
+	static atomic_t dev_counter = ATOMIC_INIT(0);
+	struct devlink *devlink;
+
+	devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL);
+	if (!devlink)
+		return NULL;
+	devlink->ops = ops;
+	devlink_net_set(devlink, &init_net);
+
+different_name:
+	devlink->index = atomic_inc_return(&dev_counter);
+	if (devlink->index < 0) {
+		/* wrapped */
+		atomic_dec(&dev_counter);
+		kfree(devlink);
+		return NULL;
+	}
+	/* atomic_inc_return makes it start at 1, make it start at 0 */
+	devlink->index--;
+
+	dev_set_name(&devlink->dev, DEVLINK_GENL_NAME "%d", devlink->index);
+	if (devlink_name_exists(devlink_net(devlink), devlink_name(devlink)))
+		goto different_name;
+
+	INIT_LIST_HEAD(&devlink->port_list);
+	device_initialize(&devlink->dev);
+	devlink->dev.class = &devlink_class;
+	devlink->dev.platform_data = devlink;
+	return devlink;
+}
+EXPORT_SYMBOL_GPL(devlink_alloc);
+
+/**
+ *	devlink_register - Register devlink instance
+ *
+ *	@devlink: devlink
+ */
+int devlink_register(struct devlink *devlink)
+{
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	err = device_add(&devlink->dev);
+	if (err)
+		goto unlock;
+	list_add_tail(&devlink->list, &devlink_list);
+	devlink_notify(devlink, DEVLINK_CMD_NEW);
+unlock:
+	mutex_unlock(&devlink_mutex);
+	return err;
+}
+EXPORT_SYMBOL_GPL(devlink_register);
+
+/**
+ *	devlink_unregister - Unregister devlink instance
+ *
+ *	@devlink: devlink
+ */
+void devlink_unregister(struct devlink *devlink)
+{
+	mutex_lock(&devlink_mutex);
+	devlink_notify(devlink, DEVLINK_CMD_DEL);
+	list_del(&devlink->list);
+	device_del(&devlink->dev);
+	mutex_unlock(&devlink_mutex);
+}
+EXPORT_SYMBOL_GPL(devlink_unregister);
+
+/**
+ *	devlink_free - Free devlink instance resources
+ *
+ *	@devlink: devlink
+ */
+void devlink_free(struct devlink *devlink)
+{
+	put_device(&devlink->dev);
+}
+EXPORT_SYMBOL_GPL(devlink_free);
+
+/**
+ *	devlink_port_register - Register devlink port
+ *
+ *	@devlink: devlink
+ *	@devlink_port: devlink port
+ *	@port_index
+ *
+ *	Register devlink port with provided port index. User can use
+ *	any indexing, even hw-related one. devlink_port structure
+ *	is convenient to be embedded inside user driver private structure.
+ *	Note that the caller should take care of zeroing the devlink_port
+ *	structure.
+ */
+int devlink_port_register(struct devlink *devlink,
+			  struct devlink_port *devlink_port,
+			  unsigned int port_index)
+{
+	mutex_lock(&devlink_port_mutex);
+	if (devlink_port_index_exists(devlink, port_index)) {
+		mutex_unlock(&devlink_port_mutex);
+		return -EEXIST;
+	}
+	devlink_port->devlink = devlink;
+	devlink_port->index = port_index;
+	devlink_port->type = DEVLINK_PORT_TYPE_NOTSET;
+	devlink_port->registered = true;
+	list_add_tail(&devlink_port->list, &devlink->port_list);
+	mutex_unlock(&devlink_port_mutex);
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_port_register);
+
+/**
+ *	devlink_port_unregister - Unregister devlink port
+ *
+ *	@devlink_port: devlink port
+ */
+void devlink_port_unregister(struct devlink_port *devlink_port)
+{
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_DEL);
+	mutex_lock(&devlink_port_mutex);
+	list_del(&devlink_port->list);
+	mutex_unlock(&devlink_port_mutex);
+}
+EXPORT_SYMBOL_GPL(devlink_port_unregister);
+
+static void __devlink_port_type_set(struct devlink_port *devlink_port,
+				    enum devlink_port_type type,
+				    void *type_dev)
+{
+	devlink_port->type = type;
+	devlink_port->type_dev = type_dev;
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+}
+
+/**
+ *	devlink_port_type_eth_set - Set port type to Ethernet
+ *
+ *	@devlink_port: devlink port
+ *	@netdev: related netdevice
+ */
+void devlink_port_type_eth_set(struct devlink_port *devlink_port,
+			       struct net_device *netdev)
+{
+	return __devlink_port_type_set(devlink_port,
+				       DEVLINK_PORT_TYPE_ETH, netdev);
+}
+EXPORT_SYMBOL_GPL(devlink_port_type_eth_set);
+
+/**
+ *	devlink_port_type_ib_set - Set port type to InfiniBand
+ *
+ *	@devlink_port: devlink port
+ *	@ibdev: related IB device
+ */
+void devlink_port_type_ib_set(struct devlink_port *devlink_port,
+			      struct ib_device *ibdev)
+{
+	return __devlink_port_type_set(devlink_port,
+				       DEVLINK_PORT_TYPE_IB, ibdev);
+}
+EXPORT_SYMBOL_GPL(devlink_port_type_ib_set);
+
+/**
+ *	devlink_port_type_clear - Clear port type
+ *
+ *	@devlink_port: devlink port
+ */
+void devlink_port_type_clear(struct devlink_port *devlink_port)
+{
+	return __devlink_port_type_set(devlink_port,
+				       DEVLINK_PORT_TYPE_NOTSET, NULL);
+}
+EXPORT_SYMBOL_GPL(devlink_port_type_clear);
+
+/**
+ *	devlink_port_split_set - Set port is splitted
+ *
+ *	@devlink_port: devlink port
+ *	@split_group: split group - identifies group splitted port is part of
+ */
+void devlink_port_split_set(struct devlink_port *devlink_port,
+			    u32 split_group)
+{
+	devlink_port->split = true;;
+	devlink_port->split_group = split_group;
+	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
+}
+EXPORT_SYMBOL_GPL(devlink_port_split_set);
+
+static void __devlink_free(struct devlink *devlink)
+{
+	kfree(devlink);
+}
+
+static struct devlink *dev_to_devlink(struct device *dev)
+{
+	return container_of(dev, struct devlink, dev);
+}
+
+static ssize_t index_show(struct device *dev,
+			  struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", dev_to_devlink(dev)->index);
+}
+static DEVICE_ATTR_RO(index);
+
+static ssize_t name_show(struct device *dev,
+			 struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%s\n", dev_name(dev));
+}
+static DEVICE_ATTR_RO(name);
+
+static struct attribute *devlink_attrs[] = {
+	&dev_attr_index.attr,
+	&dev_attr_name.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(devlink);
+
+static void devlink_dev_release(struct device *dev)
+{
+	__devlink_free(dev_to_devlink(dev));
+}
+
+static const void *devlink_namespace(struct device *dev)
+{
+	return devlink_net(dev_to_devlink(dev));
+}
+
+static struct class devlink_class = {
+	.name = DEVLINK_GENL_NAME,
+	.owner = THIS_MODULE,
+	.dev_release = devlink_dev_release,
+	.dev_groups = devlink_groups,
+	.ns_type = &net_ns_type_operations,
+	.namespace = devlink_namespace,
+};
+
+static int __init devlink_module_init(void)
+{
+	int err;
+
+	err = class_register(&devlink_class);
+	if (err)
+		return err;
+	err = genl_register_family_with_ops_groups(&devlink_nl_family,
+						   devlink_nl_ops,
+						   devlink_nl_mcgrps);
+	if (err)
+		goto err_genl_register;
+	return 0;
+
+err_genl_register:
+	class_unregister(&devlink_class);
+	return err;
+}
+
+static void __exit devlink_module_exit(void)
+{
+	genl_unregister_family(&devlink_nl_family);
+	class_unregister(&devlink_class);
+}
+
+module_init(devlink_module_init);
+module_exit(devlink_module_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jiri@mellanox.com>");
+MODULE_DESCRIPTION("Network physical device Netlink interface");
+MODULE_ALIAS_GENL_FAMILY(DEVLINK_GENL_NAME);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 2/9] mlx4: Implement devlink interface
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
  2016-02-22 18:31 ` [patch net-next 1/9] Introduce devlink infrastructure Jiri Pirko
@ 2016-02-22 18:31 ` Jiri Pirko
  2016-02-22 18:31 ` [patch net-next 3/9] mlx4: Implement port type setting via " Jiri Pirko
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Jiri Pirko <jiri@mellanox.com>

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/infiniband/hw/mlx4/main.c              |  7 ++++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  8 ++++-
 drivers/net/ethernet/mellanox/mlx4/intf.c      |  9 ++++++
 drivers/net/ethernet/mellanox/mlx4/main.c      | 45 +++++++++++++++++++-------
 drivers/net/ethernet/mellanox/mlx4/mlx4.h      |  2 ++
 include/linux/mlx4/driver.h                    |  3 ++
 6 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 1c7ab6c..a15a7b3 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -41,6 +41,7 @@
 #include <linux/if_vlan.h>
 #include <net/ipv6.h>
 #include <net/addrconf.h>
+#include <net/devlink.h>
 
 #include <rdma/ib_smi.h>
 #include <rdma/ib_user_verbs.h>
@@ -2519,6 +2520,9 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	}
 
 	ibdev->ib_active = true;
+	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+		devlink_port_type_ib_set(mlx4_get_devlink_port(dev, i),
+					 &ibdev->ib_dev);
 
 	if (mlx4_is_mfunc(ibdev->dev))
 		init_pkeys(ibdev);
@@ -2643,7 +2647,10 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
 {
 	struct mlx4_ib_dev *ibdev = ibdev_ptr;
 	int p;
+	int i;
 
+	mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+		devlink_port_type_clear(mlx4_get_devlink_port(dev, i));
 	ibdev->ib_active = false;
 	flush_workqueue(wq);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 01d6a96..ab6eeb0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -40,6 +40,7 @@
 #include <net/ip.h>
 #include <net/busy_poll.h>
 #include <net/vxlan.h>
+#include <net/devlink.h>
 
 #include <linux/mlx4/driver.h>
 #include <linux/mlx4/device.h>
@@ -2033,8 +2034,11 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
 	en_dbg(DRV, priv, "Destroying netdev on port:%d\n", priv->port);
 
 	/* Unregister device - this will close the port if it was up */
-	if (priv->registered)
+	if (priv->registered) {
+		devlink_port_type_clear(mlx4_get_devlink_port(mdev->dev,
+							      priv->port));
 		unregister_netdev(dev);
+	}
 
 	if (priv->allocated)
 		mlx4_free_hwq_res(mdev->dev, &priv->res, MLX4_EN_PAGE_SIZE);
@@ -3050,6 +3054,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	}
 
 	priv->registered = 1;
+	devlink_port_type_eth_set(mlx4_get_devlink_port(mdev->dev, priv->port),
+				  dev);
 
 	return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 0472941..dec77d6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -34,6 +34,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/errno.h>
+#include <net/devlink.h>
 
 #include "mlx4.h"
 
@@ -249,3 +250,11 @@ void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int
 	return result;
 }
 EXPORT_SYMBOL_GPL(mlx4_get_protocol_dev);
+
+struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port)
+{
+	struct mlx4_port_info *info = &mlx4_priv(dev)->port[port];
+
+	return &info->devlink_port;
+}
+EXPORT_SYMBOL_GPL(mlx4_get_devlink_port);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index f1b6d21..a5f54a5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -42,6 +42,7 @@
 #include <linux/io-mapping.h>
 #include <linux/delay.h>
 #include <linux/kmod.h>
+#include <net/devlink.h>
 
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/doorbell.h>
@@ -2847,8 +2848,13 @@ no_msi:
 
 static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 {
+	struct devlink *devlink = priv_to_devlink(mlx4_priv(dev));
 	struct mlx4_port_info *info = &mlx4_priv(dev)->port[port];
-	int err = 0;
+	int err;
+
+	err = devlink_port_register(devlink, &info->devlink_port, port);
+	if (err)
+		return err;
 
 	info->dev = dev;
 	info->port = port;
@@ -2873,6 +2879,7 @@ static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 	err = device_create_file(&dev->persist->pdev->dev, &info->port_attr);
 	if (err) {
 		mlx4_err(dev, "Failed to create file for port %d\n", port);
+		devlink_port_unregister(&info->devlink_port);
 		info->port = -1;
 	}
 
@@ -3646,21 +3653,23 @@ err_disable_pdev:
 
 static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 {
+	struct devlink *devlink;
 	struct mlx4_priv *priv;
 	struct mlx4_dev *dev;
 	int ret;
 
 	printk_once(KERN_INFO "%s", mlx4_version);
 
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (!priv)
+	devlink = devlink_alloc(NULL, sizeof(*priv));
+	if (!devlink)
 		return -ENOMEM;
+	priv = devlink_priv(devlink);
 
 	dev       = &priv->dev;
 	dev->persist = kzalloc(sizeof(*dev->persist), GFP_KERNEL);
 	if (!dev->persist) {
-		kfree(priv);
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto err_devlink_free;
 	}
 	dev->persist->pdev = pdev;
 	dev->persist->dev = dev;
@@ -3669,14 +3678,24 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	mutex_init(&dev->persist->device_state_mutex);
 	mutex_init(&dev->persist->interface_state_mutex);
 
+	set_devlink_dev(devlink, &pdev->dev);
+	ret = devlink_register(devlink);
+	if (ret)
+		goto err_persist_free;
+
 	ret =  __mlx4_init_one(pdev, id->driver_data, priv);
-	if (ret) {
-		kfree(dev->persist);
-		kfree(priv);
-	} else {
-		pci_save_state(pdev);
-	}
+	if (ret)
+		goto err_devlink_unregister;
 
+	pci_save_state(pdev);
+	return 0;
+
+err_devlink_unregister:
+	devlink_unregister(devlink);
+err_persist_free:
+	kfree(dev->persist);
+err_devlink_free:
+	devlink_free(devlink);
 	return ret;
 }
 
@@ -3777,6 +3796,7 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 	struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
 	struct mlx4_dev  *dev  = persist->dev;
 	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct devlink *devlink = priv_to_devlink(priv);
 	int active_vfs = 0;
 
 	mutex_lock(&persist->interface_state_mutex);
@@ -3807,8 +3827,9 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
+	devlink_unregister(devlink);
 	kfree(dev->persist);
-	kfree(priv);
+	devlink_free(devlink);
 	pci_set_drvdata(pdev, NULL);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 7baef52..ef96831 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -45,6 +45,7 @@
 #include <linux/workqueue.h>
 #include <linux/interrupt.h>
 #include <linux/spinlock.h>
+#include <net/devlink.h>
 
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/driver.h>
@@ -828,6 +829,7 @@ struct mlx4_port_info {
 	struct mlx4_roce_gid_table gid_table;
 	int			base_qpn;
 	struct cpu_rmap		*rmap;
+	struct devlink_port	devlink_port;
 };
 
 struct mlx4_sense {
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 2e8af00..bd0e707 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -33,6 +33,7 @@
 #ifndef MLX4_DRIVER_H
 #define MLX4_DRIVER_H
 
+#include <net/devlink.h>
 #include <linux/mlx4/device.h>
 
 struct mlx4_dev;
@@ -89,6 +90,8 @@ int mlx4_port_map_set(struct mlx4_dev *dev, struct mlx4_port_map *v2p);
 
 void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum mlx4_protocol proto, int port);
 
+struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port);
+
 static inline u64 mlx4_mac_to_u64(u8 *addr)
 {
 	u64 mac = 0;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
  2016-02-22 18:31 ` [patch net-next 1/9] Introduce devlink infrastructure Jiri Pirko
  2016-02-22 18:31 ` [patch net-next 2/9] mlx4: Implement devlink interface Jiri Pirko
@ 2016-02-22 18:31 ` Jiri Pirko
  2016-02-23 11:26   ` Hannes Frederic Sowa
  2016-02-22 18:31 ` [patch net-next 4/9] mlxsw: Implement " Jiri Pirko
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Jiri Pirko <jiri@mellanox.com>

So far, there has been an mlx4-specific sysfs file allowing user to
change port type to either Ethernet of InfiniBand. This is very
inconvenient.

Allow to expose the same ability to set port type in a generic way
using devlink interface.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 86 +++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index a5f54a5..4bac714 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1052,36 +1052,20 @@ static ssize_t show_port_type(struct device *dev,
 	return strlen(buf);
 }
 
-static ssize_t set_port_type(struct device *dev,
-			     struct device_attribute *attr,
-			     const char *buf, size_t count)
+static int __set_port_type(struct mlx4_port_info *info,
+			   enum mlx4_port_type port_type)
 {
-	struct mlx4_port_info *info = container_of(attr, struct mlx4_port_info,
-						   port_attr);
 	struct mlx4_dev *mdev = info->dev;
 	struct mlx4_priv *priv = mlx4_priv(mdev);
 	enum mlx4_port_type types[MLX4_MAX_PORTS];
 	enum mlx4_port_type new_types[MLX4_MAX_PORTS];
-	static DEFINE_MUTEX(set_port_type_mutex);
 	int i;
 	int err = 0;
 
-	mutex_lock(&set_port_type_mutex);
-
-	if (!strcmp(buf, "ib\n"))
-		info->tmp_type = MLX4_PORT_TYPE_IB;
-	else if (!strcmp(buf, "eth\n"))
-		info->tmp_type = MLX4_PORT_TYPE_ETH;
-	else if (!strcmp(buf, "auto\n"))
-		info->tmp_type = MLX4_PORT_TYPE_AUTO;
-	else {
-		mlx4_err(mdev, "%s is not supported port type\n", buf);
-		err = -EINVAL;
-		goto err_out;
-	}
-
 	mlx4_stop_sense(mdev);
 	mutex_lock(&priv->port_mutex);
+	info->tmp_type = port_type;
+
 	/* Possible type is always the one that was delivered */
 	mdev->caps.possible_type[info->port] = info->tmp_type;
 
@@ -1123,6 +1107,37 @@ static ssize_t set_port_type(struct device *dev,
 out:
 	mlx4_start_sense(mdev);
 	mutex_unlock(&priv->port_mutex);
+
+	return err;
+}
+
+static ssize_t set_port_type(struct device *dev,
+			     struct device_attribute *attr,
+			     const char *buf, size_t count)
+{
+	struct mlx4_port_info *info = container_of(attr, struct mlx4_port_info,
+						   port_attr);
+	struct mlx4_dev *mdev = info->dev;
+	enum mlx4_port_type port_type;
+	static DEFINE_MUTEX(set_port_type_mutex);
+	int err;
+
+	mutex_lock(&set_port_type_mutex);
+
+	if (!strcmp(buf, "ib\n")) {
+		port_type = MLX4_PORT_TYPE_IB;
+	} else if (!strcmp(buf, "eth\n")) {
+		port_type = MLX4_PORT_TYPE_ETH;
+	} else if (!strcmp(buf, "auto\n")) {
+		port_type = MLX4_PORT_TYPE_AUTO;
+	} else {
+		mlx4_err(mdev, "%s is not supported port type\n", buf);
+		err = -EINVAL;
+		goto err_out;
+	}
+
+	err = __set_port_type(info, port_type);
+
 err_out:
 	mutex_unlock(&set_port_type_mutex);
 
@@ -3651,6 +3666,35 @@ err_disable_pdev:
 	return err;
 }
 
+static int mlx4_devlink_port_type_set(struct devlink_port *devlink_port,
+				      enum devlink_port_type port_type)
+{
+	struct mlx4_port_info *info = container_of(devlink_port,
+						   struct mlx4_port_info,
+						   devlink_port);
+	enum mlx4_port_type mlx4_port_type;
+
+	switch (port_type) {
+	case DEVLINK_PORT_TYPE_AUTO:
+		mlx4_port_type = MLX4_PORT_TYPE_AUTO;
+		break;
+	case DEVLINK_PORT_TYPE_ETH:
+		mlx4_port_type = MLX4_PORT_TYPE_ETH;
+		break;
+	case DEVLINK_PORT_TYPE_IB:
+		mlx4_port_type = MLX4_PORT_TYPE_IB;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return __set_port_type(info, mlx4_port_type);
+}
+
+static const struct devlink_ops mlx4_devlink_ops = {
+	.port_type_set	= mlx4_devlink_port_type_set,
+};
+
 static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct devlink *devlink;
@@ -3660,7 +3704,7 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	printk_once(KERN_INFO "%s", mlx4_version);
 
-	devlink = devlink_alloc(NULL, sizeof(*priv));
+	devlink = devlink_alloc(&mlx4_devlink_ops, sizeof(*priv));
 	if (!devlink)
 		return -ENOMEM;
 	priv = devlink_priv(devlink);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 4/9] mlxsw: Implement devlink interface
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (2 preceding siblings ...)
  2016-02-22 18:31 ` [patch net-next 3/9] mlx4: Implement port type setting via " Jiri Pirko
@ 2016-02-22 18:31 ` Jiri Pirko
  2016-02-22 18:32 ` [patch net-next 5/9] mlxsw: core: Add devlink port splitter callbacks Jiri Pirko
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:31 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Jiri Pirko <jiri@mellanox.com>

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c     | 24 ++++++++++++++++++------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 20 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  2 ++
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 20 ++++++++++++++++++++
 4 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 22379eb..57d9655 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -56,6 +56,7 @@
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <asm/byteorder.h>
+#include <net/devlink.h>
 
 #include "core.h"
 #include "item.h"
@@ -791,6 +792,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	const char *device_kind = mlxsw_bus_info->device_kind;
 	struct mlxsw_core *mlxsw_core;
 	struct mlxsw_driver *mlxsw_driver;
+	struct devlink *devlink;
 	size_t alloc_size;
 	int err;
 
@@ -798,12 +800,13 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (!mlxsw_driver)
 		return -EINVAL;
 	alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
-	mlxsw_core = kzalloc(alloc_size, GFP_KERNEL);
-	if (!mlxsw_core) {
+	devlink = devlink_alloc(NULL, alloc_size);
+	if (!devlink) {
 		err = -ENOMEM;
-		goto err_core_alloc;
+		goto err_devlink_alloc;
 	}
 
+	mlxsw_core = devlink_priv(devlink);
 	INIT_LIST_HEAD(&mlxsw_core->rx_listener_list);
 	INIT_LIST_HEAD(&mlxsw_core->event_listener_list);
 	mlxsw_core->driver = mlxsw_driver;
@@ -841,6 +844,11 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (err)
 		goto err_hwmon_init;
 
+	set_devlink_dev(devlink, mlxsw_bus_info->dev);
+	err = devlink_register(devlink);
+	if (err)
+		goto err_devlink_register;
+
 	err = mlxsw_driver->init(mlxsw_core->driver_priv, mlxsw_core,
 				 mlxsw_bus_info);
 	if (err)
@@ -855,6 +863,8 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 err_debugfs_init:
 	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
 err_driver_init:
+	devlink_unregister(devlink);
+err_devlink_register:
 err_hwmon_init:
 	mlxsw_emad_fini(mlxsw_core);
 err_emad_init:
@@ -864,8 +874,8 @@ err_bus_init:
 err_alloc_lag_mapping:
 	free_percpu(mlxsw_core->pcpu_stats);
 err_alloc_stats:
-	kfree(mlxsw_core);
-err_core_alloc:
+	devlink_free(devlink);
+err_devlink_alloc:
 	mlxsw_core_driver_put(device_kind);
 	return err;
 }
@@ -874,14 +884,16 @@ EXPORT_SYMBOL(mlxsw_core_bus_device_register);
 void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 {
 	const char *device_kind = mlxsw_core->bus_info->device_kind;
+	struct devlink *devlink = priv_to_devlink(mlxsw_core);
 
 	mlxsw_core_debugfs_fini(mlxsw_core);
 	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+	devlink_unregister(devlink);
 	mlxsw_emad_fini(mlxsw_core);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
 	kfree(mlxsw_core->lag.mapping);
 	free_percpu(mlxsw_core->pcpu_stats);
-	kfree(mlxsw_core);
+	devlink_free(devlink);
 	mlxsw_core_driver_put(device_kind);
 }
 EXPORT_SYMBOL(mlxsw_core_bus_device_unregister);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 217856b..9d4b06c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -49,6 +49,7 @@
 #include <linux/jiffies.h>
 #include <linux/bitops.h>
 #include <linux/list.h>
+#include <net/devlink.h>
 #include <net/switchdev.h>
 #include <generated/utsrelease.h>
 
@@ -1351,7 +1352,9 @@ static const struct ethtool_ops mlxsw_sp_port_ethtool_ops = {
 
 static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 {
+	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
 	struct mlxsw_sp_port *mlxsw_sp_port;
+	struct devlink_port *devlink_port;
 	struct net_device *dev;
 	bool usable;
 	size_t bytes;
@@ -1360,6 +1363,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	dev = alloc_etherdev(sizeof(struct mlxsw_sp_port));
 	if (!dev)
 		return -ENOMEM;
+	SET_NETDEV_DEV(dev, devlink_dev(devlink));
 	mlxsw_sp_port = netdev_priv(dev);
 	mlxsw_sp_port->dev = dev;
 	mlxsw_sp_port->mlxsw_sp = mlxsw_sp;
@@ -1417,6 +1421,14 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 		goto port_not_usable;
 	}
 
+	devlink_port = &mlxsw_sp_port->devlink_port;
+	err = devlink_port_register(devlink, devlink_port, local_port);
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to register devlink port\n",
+			mlxsw_sp_port->local_port);
+		goto err_devlink_port_register;
+	}
+
 	err = mlxsw_sp_port_system_port_mapping_set(mlxsw_sp_port);
 	if (err) {
 		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to set system port mapping\n",
@@ -1457,6 +1469,8 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 		goto err_register_netdev;
 	}
 
+	devlink_port_type_eth_set(devlink_port, dev);
+
 	err = mlxsw_sp_port_vlan_init(mlxsw_sp_port);
 	if (err)
 		goto err_port_vlan_init;
@@ -1472,6 +1486,8 @@ err_port_admin_status_set:
 err_port_mtu_set:
 err_port_swid_set:
 err_port_system_port_mapping_set:
+	devlink_port_unregister(&mlxsw_sp_port->devlink_port);
+err_devlink_port_register:
 port_not_usable:
 err_port_module_check:
 err_dev_addr_init:
@@ -1505,10 +1521,14 @@ static void mlxsw_sp_port_vports_fini(struct mlxsw_sp_port *mlxsw_sp_port)
 static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = mlxsw_sp->ports[local_port];
+	struct devlink_port *devlink_port;
 
 	if (!mlxsw_sp_port)
 		return;
+	devlink_port = &mlxsw_sp_port->devlink_port;
+	devlink_port_type_clear(devlink_port);
 	unregister_netdev(mlxsw_sp_port->dev); /* This calls ndo_stop */
+	devlink_port_unregister(devlink_port);
 	mlxsw_sp_port_vports_fini(mlxsw_sp_port);
 	mlxsw_sp_port_switchdev_fini(mlxsw_sp_port);
 	free_percpu(mlxsw_sp_port->pcpu_stats);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 7f42eb1..0df9a19 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -43,6 +43,7 @@
 #include <linux/if_vlan.h>
 #include <linux/list.h>
 #include <net/switchdev.h>
+#include <net/devlink.h>
 
 #include "port.h"
 #include "core.h"
@@ -162,6 +163,7 @@ struct mlxsw_sp_port {
 	unsigned long *untagged_vlans;
 	/* VLAN interfaces */
 	struct list_head vports_list;
+	struct devlink_port devlink_port;
 };
 
 static inline struct mlxsw_sp_port *
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index d85960c..7a60a26 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -43,6 +43,7 @@
 #include <linux/device.h>
 #include <linux/skbuff.h>
 #include <linux/if_vlan.h>
+#include <net/devlink.h>
 #include <net/switchdev.h>
 #include <generated/utsrelease.h>
 
@@ -78,6 +79,7 @@ struct mlxsw_sx_port {
 	struct mlxsw_sx_port_pcpu_stats __percpu *pcpu_stats;
 	struct mlxsw_sx *mlxsw_sx;
 	u8 local_port;
+	struct devlink_port devlink_port;
 };
 
 /* tx_hdr_version
@@ -953,7 +955,9 @@ mlxsw_sx_port_mac_learning_mode_set(struct mlxsw_sx_port *mlxsw_sx_port,
 
 static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 {
+	struct devlink *devlink = priv_to_devlink(mlxsw_sx->core);
 	struct mlxsw_sx_port *mlxsw_sx_port;
+	struct devlink_port *devlink_port;
 	struct net_device *dev;
 	bool usable;
 	int err;
@@ -1007,6 +1011,14 @@ static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 		goto port_not_usable;
 	}
 
+	devlink_port = &mlxsw_sx_port->devlink_port;
+	err = devlink_port_register(devlink, devlink_port, local_port);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to register devlink port\n",
+			mlxsw_sx_port->local_port);
+		goto err_devlink_port_register;
+	}
+
 	err = mlxsw_sx_port_system_port_mapping_set(mlxsw_sx_port);
 	if (err) {
 		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to set system port mapping\n",
@@ -1064,6 +1076,8 @@ static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 		goto err_register_netdev;
 	}
 
+	devlink_port_type_eth_set(devlink_port, dev);
+
 	mlxsw_sx->ports[local_port] = mlxsw_sx_port;
 	return 0;
 
@@ -1075,6 +1089,8 @@ err_port_mtu_set:
 err_port_speed_set:
 err_port_swid_set:
 err_port_system_port_mapping_set:
+	devlink_port_unregister(&mlxsw_sx_port->devlink_port);
+err_devlink_port_register:
 port_not_usable:
 err_port_module_check:
 err_dev_addr_get:
@@ -1087,11 +1103,15 @@ err_alloc_stats:
 static void mlxsw_sx_port_remove(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 {
 	struct mlxsw_sx_port *mlxsw_sx_port = mlxsw_sx->ports[local_port];
+	struct devlink_port *devlink_port;
 
 	if (!mlxsw_sx_port)
 		return;
+	devlink_port = &mlxsw_sx_port->devlink_port;
+	devlink_port_type_clear(devlink_port);
 	unregister_netdev(mlxsw_sx_port->dev); /* This calls ndo_stop */
 	mlxsw_sx_port_swid_set(mlxsw_sx_port, MLXSW_PORT_SWID_DISABLED_PORT);
+	devlink_port_unregister(devlink_port);
 	free_percpu(mlxsw_sx_port->pcpu_stats);
 	free_netdev(mlxsw_sx_port->dev);
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 5/9] mlxsw: core: Add devlink port splitter callbacks
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (3 preceding siblings ...)
  2016-02-22 18:31 ` [patch net-next 4/9] mlxsw: Implement " Jiri Pirko
@ 2016-02-22 18:32 ` Jiri Pirko
  2016-02-22 18:32 ` [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown Jiri Pirko
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:32 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Jiri Pirko <jiri@mellanox.com>

Add middle layer in mlxsw core code to forward port split/unsplit calls
into specific ASIC drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 34 +++++++++++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlxsw/core.h |  2 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 57d9655..e9d3040 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -785,6 +785,38 @@ static void mlxsw_core_debugfs_fini(struct mlxsw_core *mlxsw_core)
 	debugfs_remove_recursive(mlxsw_core->dbg_dir);
 }
 
+static int mlxsw_devlink_port_split(struct devlink *devlink,
+				    unsigned int port_index,
+				    unsigned int count)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+
+	if (port_index >= MLXSW_PORT_MAX_PORTS)
+		return -EINVAL;
+	if (!mlxsw_core->driver->port_split)
+		return -EOPNOTSUPP;
+	return mlxsw_core->driver->port_split(mlxsw_core->driver_priv,
+					      port_index, count);
+}
+
+static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
+				      unsigned int port_index)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+
+	if (port_index >= MLXSW_PORT_MAX_PORTS)
+		return -EINVAL;
+	if (!mlxsw_core->driver->port_unsplit)
+		return -EOPNOTSUPP;
+	return mlxsw_core->driver->port_unsplit(mlxsw_core->driver_priv,
+						port_index);
+}
+
+static const struct devlink_ops mlxsw_devlink_ops = {
+	.port_split	= mlxsw_devlink_port_split,
+	.port_unsplit	= mlxsw_devlink_port_unsplit,
+};
+
 int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 				   const struct mlxsw_bus *mlxsw_bus,
 				   void *bus_priv)
@@ -800,7 +832,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (!mlxsw_driver)
 		return -EINVAL;
 	alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
-	devlink = devlink_alloc(NULL, alloc_size);
+	devlink = devlink_alloc(&mlxsw_devlink_ops, alloc_size);
 	if (!devlink) {
 		err = -ENOMEM;
 		goto err_devlink_alloc;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index a017236..c73d1c0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -186,6 +186,8 @@ struct mlxsw_driver {
 	int (*init)(void *driver_priv, struct mlxsw_core *mlxsw_core,
 		    const struct mlxsw_bus_info *mlxsw_bus_info);
 	void (*fini)(void *driver_priv);
+	int (*port_split)(void *driver_priv, u8 local_port, unsigned int count);
+	int (*port_unsplit)(void *driver_priv, u8 local_port);
 	void (*txhdr_construct)(struct sk_buff *skb,
 				const struct mlxsw_tx_info *tx_info);
 	u8 txhdr_len;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (4 preceding siblings ...)
  2016-02-22 18:32 ` [patch net-next 5/9] mlxsw: core: Add devlink port splitter callbacks Jiri Pirko
@ 2016-02-22 18:32 ` Jiri Pirko
  2016-02-22 20:32   ` John Fastabend
  2016-02-22 18:32 ` [patch net-next 7/9] mlxsw: spectrum: Store local port to module mapping during init Jiri Pirko
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:32 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Ido Schimmel <idosch@mellanox.com>

When splitting a port we replace it with 2 or 4 other ports. To be able
to do that we need to remove the original port netdev and unmap it from
its module. However, we first mark it as disabled, as active ports
cannot be unmapped.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 9d4b06c..1d6f1ef 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -320,6 +320,15 @@ static int mlxsw_sp_port_module_check(struct mlxsw_sp_port *mlxsw_sp_port,
 	return 0;
 }
 
+static int mlxsw_sp_port_module_unmap(struct mlxsw_sp *mlxsw_sp, u8 local_port)
+{
+	char pmlp_pl[MLXSW_REG_PMLP_LEN];
+
+	mlxsw_reg_pmlp_pack(pmlp_pl, local_port);
+	mlxsw_reg_pmlp_width_set(pmlp_pl, 0);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(pmlp), pmlp_pl);
+}
+
 static int mlxsw_sp_port_open(struct net_device *dev)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
@@ -1531,6 +1540,8 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	devlink_port_unregister(devlink_port);
 	mlxsw_sp_port_vports_fini(mlxsw_sp_port);
 	mlxsw_sp_port_switchdev_fini(mlxsw_sp_port);
+	mlxsw_sp_port_swid_set(mlxsw_sp_port, MLXSW_PORT_SWID_DISABLED_PORT);
+	mlxsw_sp_port_module_unmap(mlxsw_sp, mlxsw_sp_port->local_port);
 	free_percpu(mlxsw_sp_port->pcpu_stats);
 	kfree(mlxsw_sp_port->untagged_vlans);
 	kfree(mlxsw_sp_port->active_vlans);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 7/9] mlxsw: spectrum: Store local port to module mapping during init
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (5 preceding siblings ...)
  2016-02-22 18:32 ` [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown Jiri Pirko
@ 2016-02-22 18:32 ` Jiri Pirko
  2016-02-22 18:32 ` [patch net-next 8/9] mlxsw: spectrum: Mark unused ports using NULL Jiri Pirko
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:32 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Ido Schimmel <idosch@mellanox.com>

The port netdevs are each associated with a different local port number
in the device. These local ports are grouped into groups of 4 (e.g.
(1-4), (5-8)) called clusters. The cluster constitutes the one of two
possible modules they can be mapped to. This mapping is board-specific
and done by the device's firmware during init.

When splitting a port by 4, the device requires us to first unmap all
the ports in the cluster and then map each to a single lane in the module
associated with the port netdev used as the handle for the operation.
This means that two port netdevs will disappear, as only 100Gb/s (4
lanes) ports can be split and we are guaranteed to have two of these
((1, 3), (5, 7) etc.) in a cluster.

When unsplit occurs we need to reinstantiate the two original 100Gb/s
ports and map each to its origianl module. Therefore, during driver init
store the initial local port to module mapping, so it can be used later
during unsplitting.

Note that a by 2 split doesn't require us to store the mapping, as we
only need to reinstantiate one port whose module is known.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 36 +++++++++++---------------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  1 +
 2 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 1d6f1ef..b9d963a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -305,18 +305,19 @@ mlxsw_sp_port_system_port_mapping_set(struct mlxsw_sp_port *mlxsw_sp_port)
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sspr), sspr_pl);
 }
 
-static int mlxsw_sp_port_module_check(struct mlxsw_sp_port *mlxsw_sp_port,
-				      bool *p_usable)
+static int mlxsw_sp_port_module_info_get(struct mlxsw_sp *mlxsw_sp,
+					 u8 local_port, u8 *p_module,
+					 u8 *p_width)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 	char pmlp_pl[MLXSW_REG_PMLP_LEN];
 	int err;
 
-	mlxsw_reg_pmlp_pack(pmlp_pl, mlxsw_sp_port->local_port);
+	mlxsw_reg_pmlp_pack(pmlp_pl, local_port);
 	err = mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(pmlp), pmlp_pl);
 	if (err)
 		return err;
-	*p_usable = mlxsw_reg_pmlp_width_get(pmlp_pl) ? true : false;
+	*p_module = mlxsw_reg_pmlp_module_get(pmlp_pl, 0);
+	*p_width = mlxsw_reg_pmlp_width_get(pmlp_pl);
 	return 0;
 }
 
@@ -1365,7 +1366,6 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	struct mlxsw_sp_port *mlxsw_sp_port;
 	struct devlink_port *devlink_port;
 	struct net_device *dev;
-	bool usable;
 	size_t bytes;
 	int err;
 
@@ -1417,19 +1417,6 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	 */
 	dev->hard_header_len += MLXSW_TXHDR_LEN;
 
-	err = mlxsw_sp_port_module_check(mlxsw_sp_port, &usable);
-	if (err) {
-		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to check module\n",
-			mlxsw_sp_port->local_port);
-		goto err_port_module_check;
-	}
-
-	if (!usable) {
-		dev_dbg(mlxsw_sp->bus_info->dev, "Port %d: Not usable, skipping initialization\n",
-			mlxsw_sp_port->local_port);
-		goto port_not_usable;
-	}
-
 	devlink_port = &mlxsw_sp_port->devlink_port;
 	err = devlink_port_register(devlink, devlink_port, local_port);
 	if (err) {
@@ -1497,8 +1484,6 @@ err_port_swid_set:
 err_port_system_port_mapping_set:
 	devlink_port_unregister(&mlxsw_sp_port->devlink_port);
 err_devlink_port_register:
-port_not_usable:
-err_port_module_check:
 err_dev_addr_init:
 	free_percpu(mlxsw_sp_port->pcpu_stats);
 err_alloc_stats:
@@ -1560,6 +1545,7 @@ static void mlxsw_sp_ports_remove(struct mlxsw_sp *mlxsw_sp)
 static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
 {
 	size_t alloc_size;
+	u8 module, width;
 	int i;
 	int err;
 
@@ -1569,6 +1555,13 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
 		return -ENOMEM;
 
 	for (i = 1; i < MLXSW_PORT_MAX_PORTS; i++) {
+		err = mlxsw_sp_port_module_info_get(mlxsw_sp, i, &module,
+						    &width);
+		if (err)
+			goto err_port_module_info_get;
+		if (!width)
+			continue;
+		mlxsw_sp->port_to_module[i] = module;
 		err = mlxsw_sp_port_create(mlxsw_sp, i);
 		if (err)
 			goto err_port_create;
@@ -1576,6 +1569,7 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
 	return 0;
 
 err_port_create:
+err_port_module_info_get:
 	for (i--; i >= 1; i--)
 		mlxsw_sp_port_remove(mlxsw_sp, i);
 	kfree(mlxsw_sp->ports);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 0df9a19..c75dd08 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -123,6 +123,7 @@ struct mlxsw_sp {
 	u32 ageing_time;
 	struct mlxsw_sp_upper master_bridge;
 	struct mlxsw_sp_upper lags[MLXSW_SP_LAG_MAX];
+	u8 port_to_module[MLXSW_PORT_MAX_PORTS];
 };
 
 static inline struct mlxsw_sp_upper *
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 8/9] mlxsw: spectrum: Mark unused ports using NULL
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (6 preceding siblings ...)
  2016-02-22 18:32 ` [patch net-next 7/9] mlxsw: spectrum: Store local port to module mapping during init Jiri Pirko
@ 2016-02-22 18:32 ` Jiri Pirko
  2016-02-22 18:32 ` [patch net-next 9/9] mlxsw: spectrum: Introduce port splitting Jiri Pirko
  2016-02-23  5:12 ` [patch net-next 0/9] Introduce devlink interface and first drivers to use it Andy Gospodarek
  9 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:32 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Ido Schimmel <idosch@mellanox.com>

When splitting and unsplitting we'll destroy usable ports on the fly, so
mark them using a NULL pointer to indicate that their local port number
is free and can be re-used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index b9d963a..e72de3b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1519,6 +1519,7 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 
 	if (!mlxsw_sp_port)
 		return;
+	mlxsw_sp->ports[local_port] = NULL;
 	devlink_port = &mlxsw_sp_port->devlink_port;
 	devlink_port_type_clear(devlink_port);
 	unregister_netdev(mlxsw_sp_port->dev); /* This calls ndo_stop */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [patch net-next 9/9] mlxsw: spectrum: Introduce port splitting
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (7 preceding siblings ...)
  2016-02-22 18:32 ` [patch net-next 8/9] mlxsw: spectrum: Mark unused ports using NULL Jiri Pirko
@ 2016-02-22 18:32 ` Jiri Pirko
  2016-02-23  5:12 ` [patch net-next 0/9] Introduce devlink interface and first drivers to use it Andy Gospodarek
  9 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-22 18:32 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

From: Ido Schimmel <idosch@mellanox.com>

Allow a user to split or unsplit a port using the newly introduced
devlink ops.

Once split, the original netdev is destroyed and 2 or 4 others are
created, according to user configuration. The new ports are like any
other port, with the sole difference of supporting a lower maximum
speed. When unsplit, the reverse process takes place.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/port.h     |   2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 178 ++++++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   5 +-
 3 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/port.h b/drivers/net/ethernet/mellanox/mlxsw/port.h
index 726f543..b9cde57 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/port.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/port.h
@@ -59,6 +59,8 @@
 
 #define MLXSW_PORT_DONT_CARE		(MLXSW_PORT_MAX_PORTS)
 
+#define MLXSW_PORT_MODULE_MAX_WIDTH	4
+
 enum mlxsw_port_admin_status {
 	MLXSW_PORT_ADMIN_STATUS_UP = 1,
 	MLXSW_PORT_ADMIN_STATUS_DOWN = 2,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index e72de3b..6f3f2b3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -321,6 +321,22 @@ static int mlxsw_sp_port_module_info_get(struct mlxsw_sp *mlxsw_sp,
 	return 0;
 }
 
+static int mlxsw_sp_port_module_map(struct mlxsw_sp *mlxsw_sp, u8 local_port,
+				    u8 module, u8 width, u8 lane)
+{
+	char pmlp_pl[MLXSW_REG_PMLP_LEN];
+	int i;
+
+	mlxsw_reg_pmlp_pack(pmlp_pl, local_port);
+	mlxsw_reg_pmlp_width_set(pmlp_pl, width);
+	for (i = 0; i < width; i++) {
+		mlxsw_reg_pmlp_module_set(pmlp_pl, i, module);
+		mlxsw_reg_pmlp_tx_lane_set(pmlp_pl, i, lane + i);  /* Rx & Tx */
+	}
+
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(pmlp), pmlp_pl);
+}
+
 static int mlxsw_sp_port_module_unmap(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 {
 	char pmlp_pl[MLXSW_REG_PMLP_LEN];
@@ -1360,7 +1376,8 @@ static const struct ethtool_ops mlxsw_sp_port_ethtool_ops = {
 	.set_settings		= mlxsw_sp_port_set_settings,
 };
 
-static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
+static int __mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port,
+				  bool split, u8 module)
 {
 	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
 	struct mlxsw_sp_port *mlxsw_sp_port;
@@ -1377,6 +1394,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	mlxsw_sp_port->dev = dev;
 	mlxsw_sp_port->mlxsw_sp = mlxsw_sp;
 	mlxsw_sp_port->local_port = local_port;
+	mlxsw_sp_port->split = split;
 	bytes = DIV_ROUND_UP(VLAN_N_VID, BITS_PER_BYTE);
 	mlxsw_sp_port->active_vlans = kzalloc(bytes, GFP_KERNEL);
 	if (!mlxsw_sp_port->active_vlans) {
@@ -1418,6 +1436,8 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	dev->hard_header_len += MLXSW_TXHDR_LEN;
 
 	devlink_port = &mlxsw_sp_port->devlink_port;
+	if (mlxsw_sp_port->split)
+		devlink_port_split_set(devlink_port, module);
 	err = devlink_port_register(devlink, devlink_port, local_port);
 	if (err) {
 		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to register devlink port\n",
@@ -1495,6 +1515,27 @@ err_port_active_vlans_alloc:
 	return err;
 }
 
+static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port,
+				bool split, u8 module, u8 width, u8 lane)
+{
+	int err;
+
+	err = mlxsw_sp_port_module_map(mlxsw_sp, local_port, module, width,
+				       lane);
+	if (err)
+		return err;
+
+	err = __mlxsw_sp_port_create(mlxsw_sp, local_port, split, module);
+	if (err)
+		goto err_port_create;
+
+	return 0;
+
+err_port_create:
+	mlxsw_sp_port_module_unmap(mlxsw_sp, local_port);
+	return err;
+}
+
 static void mlxsw_sp_port_vports_fini(struct mlxsw_sp_port *mlxsw_sp_port)
 {
 	struct net_device *dev = mlxsw_sp_port->dev;
@@ -1563,7 +1604,7 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
 		if (!width)
 			continue;
 		mlxsw_sp->port_to_module[i] = module;
-		err = mlxsw_sp_port_create(mlxsw_sp, i);
+		err = __mlxsw_sp_port_create(mlxsw_sp, i, false, module);
 		if (err)
 			goto err_port_create;
 	}
@@ -1577,6 +1618,137 @@ err_port_module_info_get:
 	return err;
 }
 
+static u8 mlxsw_sp_cluster_base_port_get(u8 local_port)
+{
+	u8 offset = (local_port - 1) % MLXSW_SP_PORTS_PER_CLUSTER_MAX;
+
+	return local_port - offset;
+}
+
+static int mlxsw_sp_port_split(void *priv, u8 local_port, unsigned int count)
+{
+	struct mlxsw_sp *mlxsw_sp = priv;
+	struct mlxsw_sp_port *mlxsw_sp_port;
+	u8 width = MLXSW_PORT_MODULE_MAX_WIDTH / count;
+	u8 module, cur_width, base_port;
+	int i;
+	int err;
+
+	mlxsw_sp_port = mlxsw_sp->ports[local_port];
+	if (!mlxsw_sp_port) {
+		dev_err(mlxsw_sp->bus_info->dev, "Port number \"%d\" does not exist\n",
+			local_port);
+		return -EINVAL;
+	}
+
+	if (count != 2 && count != 4) {
+		netdev_err(mlxsw_sp_port->dev, "Port can only be split into 2 or 4 ports\n");
+		return -EINVAL;
+	}
+
+	err = mlxsw_sp_port_module_info_get(mlxsw_sp, local_port, &module,
+					    &cur_width);
+	if (err) {
+		netdev_err(mlxsw_sp_port->dev, "Failed to get port's width\n");
+		return err;
+	}
+
+	if (cur_width != MLXSW_PORT_MODULE_MAX_WIDTH) {
+		netdev_err(mlxsw_sp_port->dev, "Port cannot be split further\n");
+		return -EINVAL;
+	}
+
+	/* Make sure we have enough slave (even) ports for the split. */
+	if (count == 2) {
+		base_port = local_port;
+		if (mlxsw_sp->ports[base_port + 1]) {
+			netdev_err(mlxsw_sp_port->dev, "Invalid split configuration\n");
+			return -EINVAL;
+		}
+	} else {
+		base_port = mlxsw_sp_cluster_base_port_get(local_port);
+		if (mlxsw_sp->ports[base_port + 1] ||
+		    mlxsw_sp->ports[base_port + 3]) {
+			netdev_err(mlxsw_sp_port->dev, "Invalid split configuration\n");
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < count; i++)
+		mlxsw_sp_port_remove(mlxsw_sp, base_port + i);
+
+	for (i = 0; i < count; i++) {
+		err = mlxsw_sp_port_create(mlxsw_sp, base_port + i, true,
+					   module, width, i * width);
+		if (err) {
+			dev_err(mlxsw_sp->bus_info->dev, "Failed to create split port\n");
+			goto err_port_create;
+		}
+	}
+
+	return 0;
+
+err_port_create:
+	for (i--; i >= 0; i--)
+		mlxsw_sp_port_remove(mlxsw_sp, base_port + i);
+	for (i = 0; i < count / 2; i++) {
+		module = mlxsw_sp->port_to_module[base_port + i * 2];
+		mlxsw_sp_port_create(mlxsw_sp, base_port + i * 2, false,
+				     module, MLXSW_PORT_MODULE_MAX_WIDTH, 0);
+	}
+	return err;
+}
+
+static int mlxsw_sp_port_unsplit(void *priv, u8 local_port)
+{
+	struct mlxsw_sp *mlxsw_sp = priv;
+	struct mlxsw_sp_port *mlxsw_sp_port;
+	u8 module, cur_width, base_port;
+	unsigned int count;
+	int i;
+	int err;
+
+	mlxsw_sp_port = mlxsw_sp->ports[local_port];
+	if (!mlxsw_sp_port) {
+		dev_err(mlxsw_sp->bus_info->dev, "Port number \"%d\" does not exist\n",
+			local_port);
+		return -EINVAL;
+	}
+
+	if (!mlxsw_sp_port->split) {
+		netdev_err(mlxsw_sp_port->dev, "Port wasn't split\n");
+		return -EINVAL;
+	}
+
+	err = mlxsw_sp_port_module_info_get(mlxsw_sp, local_port, &module,
+					    &cur_width);
+	if (err) {
+		netdev_err(mlxsw_sp_port->dev, "Failed to get port's width\n");
+		return err;
+	}
+	count = cur_width == 1 ? 4 : 2;
+
+	base_port = mlxsw_sp_cluster_base_port_get(local_port);
+
+	/* Determine which ports to remove. */
+	if (count == 2 && local_port >= base_port + 2)
+		base_port = base_port + 2;
+
+	for (i = 0; i < count; i++)
+		mlxsw_sp_port_remove(mlxsw_sp, base_port + i);
+
+	for (i = 0; i < count / 2; i++) {
+		module = mlxsw_sp->port_to_module[base_port + i * 2];
+		err = mlxsw_sp_port_create(mlxsw_sp, base_port + i * 2, false,
+					   module, MLXSW_PORT_MODULE_MAX_WIDTH,
+					   0);
+		if (err)
+			dev_err(mlxsw_sp->bus_info->dev, "Failed to reinstantiate port\n");
+	}
+
+	return 0;
+}
+
 static void mlxsw_sp_pude_event_func(const struct mlxsw_reg_info *reg,
 				     char *pude_pl, void *priv)
 {
@@ -2000,6 +2172,8 @@ static struct mlxsw_driver mlxsw_sp_driver = {
 	.priv_size		= sizeof(struct mlxsw_sp),
 	.init			= mlxsw_sp_init,
 	.fini			= mlxsw_sp_fini,
+	.port_split		= mlxsw_sp_port_split,
+	.port_unsplit		= mlxsw_sp_port_unsplit,
 	.txhdr_construct	= mlxsw_sp_txhdr_construct,
 	.txhdr_len		= MLXSW_TXHDR_LEN,
 	.profile		= &mlxsw_sp_config_profile,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index c75dd08..2f325f5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -58,6 +58,8 @@
 
 #define MLXSW_SP_MID_MAX 7000
 
+#define MLXSW_SP_PORTS_PER_CLUSTER_MAX 4
+
 struct mlxsw_sp_port;
 
 struct mlxsw_sp_upper {
@@ -151,7 +153,8 @@ struct mlxsw_sp_port {
 	   learning_sync:1,
 	   uc_flood:1,
 	   bridged:1,
-	   lagged:1;
+	   lagged:1,
+	   split:1;
 	u16 pvid;
 	u16 lag_id;
 	struct {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown
  2016-02-22 18:32 ` [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown Jiri Pirko
@ 2016-02-22 20:32   ` John Fastabend
  2016-02-22 21:00     ` Ido Schimmel
  2016-02-23  4:50     ` Andy Gospodarek
  0 siblings, 2 replies; 42+ messages in thread
From: John Fastabend @ 2016-02-22 20:32 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	jeffrey.t.kirsher, brouer, ivecera, rami.rosen

On 16-02-22 10:32 AM, Jiri Pirko wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> When splitting a port we replace it with 2 or 4 other ports. To be able
> to do that we need to remove the original port netdev and unmap it from
> its module. However, we first mark it as disabled, as active ports
> cannot be unmapped.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---

Hi Jiri, Ido,

You've sort of lost me on this port splitting/unsplitting thread. What
does this actually do? Are you just creating two netdevs and LAGing them
in the hardware, I'm guessing not or you wouldn't have some device API
for it and would do it using normal methods.

If its something to do with physical layout of the board itself why
don't you trigger this based on some init time introspection or an
interrupt if someone plugs in a port splitting cable/module (does that
exist?).

Thanks,
John

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown
  2016-02-22 20:32   ` John Fastabend
@ 2016-02-22 21:00     ` Ido Schimmel
  2016-02-22 21:08       ` John Fastabend
  2016-02-23  4:50     ` Andy Gospodarek
  1 sibling, 1 reply; 42+ messages in thread
From: Ido Schimmel @ 2016-02-22 21:00 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, netdev, davem, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

Hi John,

Mon, Feb 22, 2016 at 10:32:47PM IST, john.fastabend@gmail.com wrote:
>On 16-02-22 10:32 AM, Jiri Pirko wrote:
>> From: Ido Schimmel <idosch@mellanox.com>
>> 
>> When splitting a port we replace it with 2 or 4 other ports. To be able
>> to do that we need to remove the original port netdev and unmap it from
>> its module. However, we first mark it as disabled, as active ports
>> cannot be unmapped.
>> 
>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>
>Hi Jiri, Ido,
>
>You've sort of lost me on this port splitting/unsplitting thread. What
>does this actually do? Are you just creating two netdevs and LAGing them
>in the hardware, I'm guessing not or you wouldn't have some device API
>for it and would do it using normal methods.

Yep, it's not LAG. You basically have a mapping between a physical
module and a local port, which is represented by a port netdev.

Each module has 4 lanes, so if you connect a splitter (say a 2x) you can
map each 2 lanes to a different port and assign each a new local port.
These are completely independent from each other, but they can only give
you 50Gb/s max, as opposed to the original 100Gb/s (as it had 4 lanes
all to itself).

>
>If its something to do with physical layout of the board itself why
>don't you trigger this based on some init time introspection or an
>interrupt if someone plugs in a port splitting cable/module (does that
>exist?).

We currently don't have an event that tells us that a splitter is
connected. Also, had we created / destroyed these based on events, then
an accidental removal of the splitter would cause all the configuration
setup on these ports to disappear (say VLANs on a bridged port, unicast
flooding etc.).

Thanks.

>
>Thanks,
>John
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown
  2016-02-22 21:00     ` Ido Schimmel
@ 2016-02-22 21:08       ` John Fastabend
  0 siblings, 0 replies; 42+ messages in thread
From: John Fastabend @ 2016-02-22 21:08 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, netdev, davem, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

On 16-02-22 01:00 PM, Ido Schimmel wrote:
> Hi John,
> 
> Mon, Feb 22, 2016 at 10:32:47PM IST, john.fastabend@gmail.com wrote:
>> On 16-02-22 10:32 AM, Jiri Pirko wrote:
>>> From: Ido Schimmel <idosch@mellanox.com>
>>>
>>> When splitting a port we replace it with 2 or 4 other ports. To be able
>>> to do that we need to remove the original port netdev and unmap it from
>>> its module. However, we first mark it as disabled, as active ports
>>> cannot be unmapped.
>>>
>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>
>> Hi Jiri, Ido,
>>
>> You've sort of lost me on this port splitting/unsplitting thread. What
>> does this actually do? Are you just creating two netdevs and LAGing them
>> in the hardware, I'm guessing not or you wouldn't have some device API
>> for it and would do it using normal methods.
> 
> Yep, it's not LAG. You basically have a mapping between a physical
> module and a local port, which is represented by a port netdev.
> 
> Each module has 4 lanes, so if you connect a splitter (say a 2x) you can
> map each 2 lanes to a different port and assign each a new local port.
> These are completely independent from each other, but they can only give
> you 50Gb/s max, as opposed to the original 100Gb/s (as it had 4 lanes
> all to itself).
> 
>>
>> If its something to do with physical layout of the board itself why
>> don't you trigger this based on some init time introspection or an
>> interrupt if someone plugs in a port splitting cable/module (does that
>> exist?).
> 
> We currently don't have an event that tells us that a splitter is
> connected. Also, had we created / destroyed these based on events, then
> an accidental removal of the splitter would cause all the configuration
> setup on these ports to disappear (say VLANs on a bridged port, unicast
> flooding etc.).
> 

I still think this would be the better implementation. The configuration
should be saved in some daemon anyways and setup based on netdev events
so its not like it would disappear in any real system.

But seeing you don't get an interrupt or anything I guess manual
configuration is going to be the best you can do.


> Thanks.
> 
>>
>> Thanks,
>> John
>>
>>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 1/9] Introduce devlink infrastructure
  2016-02-22 18:31 ` [patch net-next 1/9] Introduce devlink infrastructure Jiri Pirko
@ 2016-02-22 22:29   ` roopa
  2016-02-23  7:47     ` Jiri Pirko
  2016-02-24  7:02   ` Yuval Mintz
  1 sibling, 1 reply; 42+ messages in thread
From: roopa @ 2016-02-22 22:29 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, nikolay, hadarh,
	jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On 2/22/16, 10:31 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Introduce devlink infrastructure for drivers to register and expose to
> userspace via generic Netlink interface.
>
> There are two basic objects defined:
> devlink - one instance for every "parent device", for example switch ASIC
> devlink port - one instance for every physical port of the device.

Like i have expressed earlier, the only thing that bothers me here is that we are creating a new devlink object for switch port when there
is an existing netdev object.

Is there a chance that the drivers you are targeting can still create netdevs for physical ports ?
It would make things so much more consistent and simpler to manage without a cost of adding yet another interface.

The port splitter support is needed at the netdev api too (ie rtnetlink). Most switchdev drivers that expose netdevs
would benefit from a native 'ip link' way to configure port splitting. This will be useful for nic drivers too.

thanks,
Roopa

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown
  2016-02-22 20:32   ` John Fastabend
  2016-02-22 21:00     ` Ido Schimmel
@ 2016-02-23  4:50     ` Andy Gospodarek
  1 sibling, 0 replies; 42+ messages in thread
From: Andy Gospodarek @ 2016-02-23  4:50 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, ogerlitz,
	yishaih, dledford, sean.hefty, hal.rosenstock, eugenia, roopa,
	nikolay, hadarh, jhs, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On Mon, Feb 22, 2016 at 12:32:47PM -0800, John Fastabend wrote:
> On 16-02-22 10:32 AM, Jiri Pirko wrote:
> > From: Ido Schimmel <idosch@mellanox.com>
> > 
> > When splitting a port we replace it with 2 or 4 other ports. To be able
> > to do that we need to remove the original port netdev and unmap it from
> > its module. However, we first mark it as disabled, as active ports
> > cannot be unmapped.
> > 
> > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> > ---
> 
> Hi Jiri, Ido,
> 
> You've sort of lost me on this port splitting/unsplitting thread. What
> does this actually do? Are you just creating two netdevs and LAGing them
> in the hardware, I'm guessing not or you wouldn't have some device API
> for it and would do it using normal methods.
> 
> If its something to do with physical layout of the board itself why
> don't you trigger this based on some init time introspection or an
> interrupt if someone plugs in a port splitting cable/module (does that
> exist?).

In some implementations there are interrupts that fire when modules are
connected or removed.  Even if an interrupt doesn't fire, it would be
possible to have a timer event/workqueue in a driver that could read
QFSP VPD information periodically to note that something has changed in
the hardware and reconfigure as needed.  I actually would perfer keying
on something like that to perform a configuration change like rather
than requiring the user configure it.

Today you have to swap a 4x10GbE for a 1x40GbE module and then
performing significant software config (possibly flashing EEPROMs in the
case of some NICs) each time.  That seems too error prone as a user
could swap out a QSFP on what they think is one port on a switch and
then perform a software change on another port without realizing it.

That accidental disruption seems less tolerable than the one where there
wrong physical port was added and removed and config needs to be
applied.  Network management apps can properly solve the problem where
devices come and go accidently due to connecting the wrong QSFP, so that
seems like a less critical use-case to handle than what I described
above.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 0/9] Introduce devlink interface and first drivers to use it
  2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
                   ` (8 preceding siblings ...)
  2016-02-22 18:32 ` [patch net-next 9/9] mlxsw: spectrum: Introduce port splitting Jiri Pirko
@ 2016-02-23  5:12 ` Andy Gospodarek
  2016-02-23  7:32   ` Jiri Pirko
  9 siblings, 1 reply; 42+ messages in thread
From: Andy Gospodarek @ 2016-02-23  5:12 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On Mon, Feb 22, 2016 at 07:31:55PM +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> There a is need for some userspace API that would allow to expose things
> that are not directly related to any device class like net_device of
> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> 
> Use cases:
> 1) get/set of port type (Ethernet/InfiniBand)
> 2) setting up port splitters - split port into multiple ones and squash again,
>    enables usage of splitter cable
> 3) setting up shared buffers - shared among multiple ports within
>    one chip (work in progress)
> 4) configuration of switch wide properties - resources division etc - This will
>    allow to pass configuration that is unacceptable to be passed as
>    a module option.

I'm generally a fan of use cases #3 and #4 (as we have previously
discussed), but I'm not sure I agree that the implementation for #2
right now.

I'm not sure I would like userspace to have control over whether or not
a port should be split or not when the hardware can be queried to
determine this.

> First patch of this set introduces a new generic Netlink based interface,
> called "devlink". It is similar to nl80211 model and it is heavily
> influenced by it, including the API definition. The devlink introduction patch
> implements use cases 1) and 2). Other 2 are in development atm and will
> be addressed by follow-ups.
> 
> It is very convenient for drivers to use devlink, as you can see in other
> patches in this set.
> 
> Counterpart for devlink is userspace tool for now called "dl". Command line
> interface and outputs are derived from "ip" tool so it should be easy
> for users to get used to it.
> 
> It is available here as a standalone tool for now:
> https://github.com/jpirko/devlink
> After this is merge in kernel, I will include the "dl" or "devlink" tool
> into iproute2 toolset.
> 
> Port type setting example:
> 	myhost:~$ dl help
> 	Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
> 	where  OBJECT := { dev | port | monitor }
> 	       OPTIONS := { -v/--verbose }
> 
> 	myhost:~$ dl dev help
> 	Usage: dl dev show [DEV]
> 	Usage: dl dev set DEV [ name NEWNAME ]
> 	
> 	myhost:~$ dl dev show
> 	0: devlink0: bus pci dev 0000:01:00.0
> 	
> 	myhost:~$ dl port help
> 	Usage: dl port show [DEV/PORT_INDEX]
> 	Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]
> 	Usage: dl port split DEV/PORT_INDEX count
> 	Usage: dl port unsplit DEV/PORT_INDEX
> 	
> 	myhost:~$ dl port show
> 	devlink0/1: type ib ibdev mlx4_0
> 	devlink0/2: type ib ibdev mlx4_0
> 	
> 	myhost:~$ sudo dl port set devlink0/1 type eth
> 	
> 	myhost:~$ dl port show
> 	devlink0/1: type eth netdev ens4
                             ^^^^^^^^^^^
> 	devlink0/2: type ib ibdev mlx4_0
                            ^^^^^^^^^^^^
I think my only other question about this implementation is whether or
not one would really want to have the true netdev/ibdev names mapped
here.

Would be as reasonable to simply specify the type (and there may be more
types within ethernet that could be useful in multi-chip configurations)
and then let normal infrastructure that exists today figure out how to
map the names for the netdevs to the devices?

> 	myhost:~$ sudo dl port set devlink0/2 type auto
> 	
> 	myhost:~$ dl port show
> 	devlink0/1: type eth netdev ens4
> 	devlink0/2: type ib(auto) ibdev mlx4_0
> 
> Port splitting example:
> 	myswitch:~$ dl port
> 	devlink0/1: type eth netdev eth0
> 	devlink0/3: type eth netdev eth1
> 	devlink0/5: type eth netdev eth2
> 	...
> 	devlink0/63: type eth netdev eth31
> 	
> 	myswitch:~$ sudo dl port split devlink0/1 2
> 	
> 	myswitch:~$ dl port
> 	devlink0/3: type eth netdev eth1
> 	devlink0/5: type eth netdev eth2
> 	...
> 	devlink0/63: type eth netdev eth31
> 	devlink0/1: type eth netdev eth0 split_group 16
> 	devlink0/2: type eth netdev eth32 split_group 16
> 	
> 	myswitch:~$ sudo dl port unsplit devlink0/1
> 	
> 	myswitch:~$ dl port
> 	devlink0/3: type eth netdev eth1
> 	devlink0/5: type eth netdev eth2
> 	...
> 	devlink0/63: type eth netdev eth31
> 	devlink0/1: type eth netdev eth0
> 
> Ido Schimmel (4):
>   mlxsw: spectrum: Unmap local port from module during teardown
>   mlxsw: spectrum: Store local port to module mapping during init
>   mlxsw: spectrum: Mark unused ports using NULL
>   mlxsw: spectrum: Introduce port splitting
> 
> Jiri Pirko (5):
>   Introduce devlink infrastructure
>   mlx4: Implement devlink interface
>   mlx4: Implement port type setting via devlink interface
>   mlxsw: Implement devlink interface
>   mlxsw: core: Add devlink port splitter callbacks
> 
>  MAINTAINERS                                    |   8 +
>  drivers/infiniband/hw/mlx4/main.c              |   7 +
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   8 +-
>  drivers/net/ethernet/mellanox/mlx4/intf.c      |   9 +
>  drivers/net/ethernet/mellanox/mlx4/main.c      | 129 +++-
>  drivers/net/ethernet/mellanox/mlx4/mlx4.h      |   2 +
>  drivers/net/ethernet/mellanox/mlxsw/core.c     |  56 +-
>  drivers/net/ethernet/mellanox/mlxsw/core.h     |   2 +
>  drivers/net/ethernet/mellanox/mlxsw/port.h     |   2 +
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 238 ++++++-
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   8 +-
>  drivers/net/ethernet/mellanox/mlxsw/switchx2.c |  20 +
>  include/linux/mlx4/driver.h                    |   3 +
>  include/net/devlink.h                          | 156 +++++
>  include/uapi/linux/devlink.h                   |  73 ++
>  net/Kconfig                                    |   7 +
>  net/core/Makefile                              |   1 +
>  net/core/devlink.c                             | 887 +++++++++++++++++++++++++
>  18 files changed, 1557 insertions(+), 59 deletions(-)
>  create mode 100644 include/net/devlink.h
>  create mode 100644 include/uapi/linux/devlink.h
>  create mode 100644 net/core/devlink.c
> 
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 0/9] Introduce devlink interface and first drivers to use it
  2016-02-23  5:12 ` [patch net-next 0/9] Introduce devlink interface and first drivers to use it Andy Gospodarek
@ 2016-02-23  7:32   ` Jiri Pirko
  2016-02-23 14:34     ` Andy Gospodarek
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23  7:32 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 06:12:15AM CET, gospo@cumulusnetworks.com wrote:
>On Mon, Feb 22, 2016 at 07:31:55PM +0100, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> There a is need for some userspace API that would allow to expose things
>> that are not directly related to any device class like net_device of
>> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
>> 
>> Use cases:
>> 1) get/set of port type (Ethernet/InfiniBand)
>> 2) setting up port splitters - split port into multiple ones and squash again,
>>    enables usage of splitter cable
>> 3) setting up shared buffers - shared among multiple ports within
>>    one chip (work in progress)
>> 4) configuration of switch wide properties - resources division etc - This will
>>    allow to pass configuration that is unacceptable to be passed as
>>    a module option.
>
>I'm generally a fan of use cases #3 and #4 (as we have previously
>discussed), but I'm not sure I agree that the implementation for #2
>right now.
>
>I'm not sure I would like userspace to have control over whether or not
>a port should be split or not when the hardware can be queried to
>determine this.

I was thinking about this myself for a long time and initially it made
more sense to me the approach you are suggesting. But after lot of
thining, I saw a lot of issues, I changed my mind and now I believe that
splitter should be set by user.

when you set the splitter port, you have 2 or 4 ports created in
hardware. They looks exactly the same as unsplitted ports, they are only
wired up differently. There is a netdev created for every splitter port,
user would add it to bridge, bond, vlan, set routes, define static fdb
entry and hw would learn fdbs itself. Now when someone disconnects the
splitter cable, 2 things may happen:
1) yourway. Netdevs disappear, all configs and state info will disappear
   with it.
2) myway. Netdevs stay, only link is down. This is the same behaviour is
   if someone uplugs cable for unplitted port.

There are also similar issues before you plug the splitter cable. So it
makes sense to let user co configure this. He knows what he wants. 1)
does not look like correct behaviour to me, 2) does.



<snip>

>> 	myhost:~$ dl port show
>> 	devlink0/1: type eth netdev ens4
>                             ^^^^^^^^^^^
>> 	devlink0/2: type ib ibdev mlx4_0
>                            ^^^^^^^^^^^^
>I think my only other question about this implementation is whether or
>not one would really want to have the true netdev/ibdev names mapped
>here.
>
>Would be as reasonable to simply specify the type (and there may be more
>types within ethernet that could be useful in multi-chip configurations)
>and then let normal infrastructure that exists today figure out how to
>map the names for the netdevs to the devices?

What normal infrastructure you have in mind? There is no info about
devlink port mapping to netdev/ibdev anywhere. Only here. I might be
missing something but I fail to see what's wrong with it.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 1/9] Introduce devlink infrastructure
  2016-02-22 22:29   ` roopa
@ 2016-02-23  7:47     ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23  7:47 UTC (permalink / raw)
  To: roopa
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, nikolay, hadarh,
	jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Mon, Feb 22, 2016 at 11:29:06PM CET, roopa@cumulusnetworks.com wrote:
>On 2/22/16, 10:31 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Introduce devlink infrastructure for drivers to register and expose to
>> userspace via generic Netlink interface.
>>
>> There are two basic objects defined:
>> devlink - one instance for every "parent device", for example switch ASIC
>> devlink port - one instance for every physical port of the device.
>
>Like i have expressed earlier, the only thing that bothers me here is that we are creating a new devlink object for switch port when there
>is an existing netdev object.

Sometimes it is, sometimes it is not (IB). I bevelieve that there is a
need for a "handle" for a physical port, regardless what type it is.
The same instance exists all the time, even if you change the type or if
the netdev is not created yet (init phase)


>
>Is there a chance that the drivers you are targeting can still create netdevs for physical ports ?
>It would make things so much more consistent and simpler to manage without a cost of adding yet another interface.

So you see a problem in having devlink_port handle here? It is just an
index, very simple. But you would rather see something like:
myhost:~$ dl dev show
0: devlink0: bus pci dev 0000:01:00.0
myhost:~$ dl port show
ens4: type eth parent devlink0
ens5: type eth parent devlink0
?

There are 2 problems with that approach:
1) It's hard to use this for IB devices, they don't necessary have
   netdev associated.
2) You have to have the netdevs created (know ifindex) in order to work
   with that from userspace. But there are usecases, for example during
   initialization that netdevs are not yet present and user needs to
   specify port for configuration (that is usecase #4 I listed in the cover
   letter)

But it is very easy to change dl userspace tool to be able to use netdev
names to specify ports when possible. Then you could do something like
for example:
myswitch:~$ sudo dl port split eth0 2


>
>The port splitter support is needed at the netdev api too (ie rtnetlink). Most switchdev drivers that expose netdevs
>would benefit from a native 'ip link' way to configure port splitting. This will be useful for nic drivers too.

Why would it be needed in rtnetlink too if it would be exported using
devlink? I don't get it. Sounds similar like if you would expose
wireless-specific stuff via rtnetlink in parallel to nl80211.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-22 18:31 ` [patch net-next 3/9] mlx4: Implement port type setting via " Jiri Pirko
@ 2016-02-23 11:26   ` Hannes Frederic Sowa
  2016-02-23 12:21     ` Jiri Pirko
  2016-02-23 17:31     ` Stephen Hemminger
  0 siblings, 2 replies; 42+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-23 11:26 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera, rami.rosen

Hi Jiri,

On 22.02.2016 19:31, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> So far, there has been an mlx4-specific sysfs file allowing user to
> change port type to either Ethernet of InfiniBand. This is very
> inconvenient.

Again, I want to express my concerns regarding all of this until this 
will be integrated into udev/systemd for stable device names. While one 
can build wrapper code around devlink to have stable devlink ports, I 
don't see a reason to include kernel code which actually has more 
problems than the sysfs approach. This harms admins to use those devices 
and will additionally require user space to write boiler plate code.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 11:26   ` Hannes Frederic Sowa
@ 2016-02-23 12:21     ` Jiri Pirko
  2016-02-23 13:28       ` Hannes Frederic Sowa
  2016-02-23 18:16       ` David Miller
  2016-02-23 17:31     ` Stephen Hemminger
  1 sibling, 2 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 12:21 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 12:26:00PM CET, hannes@stressinduktion.org wrote:
>Hi Jiri,
>
>On 22.02.2016 19:31, Jiri Pirko wrote:
>>From: Jiri Pirko <jiri@mellanox.com>
>>
>>So far, there has been an mlx4-specific sysfs file allowing user to
>>change port type to either Ethernet of InfiniBand. This is very
>>inconvenient.
>
>Again, I want to express my concerns regarding all of this until this will be
>integrated into udev/systemd for stable device names. While one can build
>wrapper code around devlink to have stable devlink ports, I don't see a
>reason to include kernel code which actually has more problems than the sysfs
>approach. This harms admins to use those devices and will additionally
>require user space to write boiler plate code.

Sysfs is not the place to do this things. It was already discussed here
multiple times. There was and attempt to use configfs, which was also
refused. Netlink is the only place to go. For multiple reasons,
including well defined api and behaviour, notifications, etc.

I think it is quite trivial to teach udev to name devlinkX devices
according to pci address (or any other address). That's all what is
needed here. I don't understand your concerns.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 12:21     ` Jiri Pirko
@ 2016-02-23 13:28       ` Hannes Frederic Sowa
  2016-02-23 14:26         ` Jiri Pirko
  2016-02-23 18:16       ` David Miller
  1 sibling, 1 reply; 42+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-23 13:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On 23.02.2016 13:21, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 12:26:00PM CET, hannes@stressinduktion.org wrote:
>> Hi Jiri,
>>
>> On 22.02.2016 19:31, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> So far, there has been an mlx4-specific sysfs file allowing user to
>>> change port type to either Ethernet of InfiniBand. This is very
>>> inconvenient.
>>
>> Again, I want to express my concerns regarding all of this until this will be
>> integrated into udev/systemd for stable device names. While one can build
>> wrapper code around devlink to have stable devlink ports, I don't see a
>> reason to include kernel code which actually has more problems than the sysfs
>> approach. This harms admins to use those devices and will additionally
>> require user space to write boiler plate code.
>
> Sysfs is not the place to do this things. It was already discussed here
> multiple times. There was and attempt to use configfs, which was also
> refused. Netlink is the only place to go. For multiple reasons,
> including well defined api and behaviour, notifications, etc.

I am not against netlink at all. My fear with this interface is simply:

1) we introduce another ifindex/name like identifiers. It took a long 
time until this stuff finally worked fine with linux. It needs 
persistent storage in userspace being applied at boot time. Why this 
complications for this probably lesser often used interface?

2) The actual devlink attributes get managed from inside devlink and not 
the driver. So driver need to modify devlink.c/devlink.h in core to add 
new attributes.

1) is easily solvable, just drop the ifindex style attributes and always 
force the user to enter the bus and bus-topology id.

For 2) I don't really know what drivers want, not sure if it is easier 
to add some small helper functions to add sysfs attributes to kobjects 
without necessarily holding a net_device. Thus mellanox drivers can use 
it and I am not sure how many other networking cards allow switching 
ports between ib and eth type. Port splitting only happens for 
interfaces which already have a net_device, no?

> I think it is quite trivial to teach udev to name devlinkX devices
> according to pci address (or any other address). That's all what is
> needed here. I don't understand your concerns.

I don't think that this interface needs the same complexity as network 
interfaces.

I am not sure, but one of the initial problems was that this information 
should already be there before the driver actually gets loaded, no? 
These changes don't solve this problem either?

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 13:28       ` Hannes Frederic Sowa
@ 2016-02-23 14:26         ` Jiri Pirko
  2016-02-23 15:16           ` Hannes Frederic Sowa
  2016-02-23 15:20           ` Andy Gospodarek
  0 siblings, 2 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 14:26 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 02:28:05PM CET, hannes@stressinduktion.org wrote:
>On 23.02.2016 13:21, Jiri Pirko wrote:
>>Tue, Feb 23, 2016 at 12:26:00PM CET, hannes@stressinduktion.org wrote:
>>>Hi Jiri,
>>>
>>>On 22.02.2016 19:31, Jiri Pirko wrote:
>>>>From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>>So far, there has been an mlx4-specific sysfs file allowing user to
>>>>change port type to either Ethernet of InfiniBand. This is very
>>>>inconvenient.
>>>
>>>Again, I want to express my concerns regarding all of this until this will be
>>>integrated into udev/systemd for stable device names. While one can build
>>>wrapper code around devlink to have stable devlink ports, I don't see a
>>>reason to include kernel code which actually has more problems than the sysfs
>>>approach. This harms admins to use those devices and will additionally
>>>require user space to write boiler plate code.
>>
>>Sysfs is not the place to do this things. It was already discussed here
>>multiple times. There was and attempt to use configfs, which was also
>>refused. Netlink is the only place to go. For multiple reasons,
>>including well defined api and behaviour, notifications, etc.
>
>I am not against netlink at all. My fear with this interface is simply:
>
>1) we introduce another ifindex/name like identifiers. It took a long time
>until this stuff finally worked fine with linux. It needs persistent storage
>in userspace being applied at boot time. Why this complications for this
>probably lesser often used interface?

Lesser often where? On switches, this interface will be used all the
time. You have to have some handle to manipulate the chip-wide stuff. In
our case it is devlink0. Similar to wireless, they have phy0. I believe
it is completely legit.


>
>2) The actual devlink attributes get managed from inside devlink and not the
>driver. So driver need to modify devlink.c/devlink.h in core to add new
>attributes.

That is exactly the point! Vendors cannot add their own specific crap,
they have to do things in generic way and extend devlink iface
accordingly. That's what we do now with ASIC shared buffer configuration
via devlink for example (in addition to port type and splitter).


>
>1) is easily solvable, just drop the ifindex style attributes and always
>force the user to enter the bus and bus-topology id.

But why? Use can easily get that info and map it to devlink index. It
aligns with nl80211 iface.

Do you really want to do commands like:
myhost:~$ dl dev show pci_0000:01:00.0
?


>
>For 2) I don't really know what drivers want, not sure if it is easier to add
>some small helper functions to add sysfs attributes to kobjects without
>necessarily holding a net_device. Thus mellanox drivers can use it and I am
>not sure how many other networking cards allow switching ports between ib and
>eth type. Port splitting only happens for interfaces which already have a
>net_device, no?

Not necessarily. IB ports that has no net_device could be split as well.
Hannes, again, sysfs approach was refused couple of times in past for this
purpose. Please leave sysfs alone.


>
>>I think it is quite trivial to teach udev to name devlinkX devices
>>according to pci address (or any other address). That's all what is
>>needed here. I don't understand your concerns.
>
>I don't think that this interface needs the same complexity as network
>interfaces.

Again, it aligns nicely with what they to in wireless in nl80211
interface. I don't see any complexity.


>
>I am not sure, but one of the initial problems was that this information
>should already be there before the driver actually gets loaded, no? These
>changes don't solve this problem either?

This is planned to be implemented in near future. Basically there would
be possible to use DEVLINK_CMD_NEW to add devlink iface for specific device
even before the driver gets loaded to serve as a place holder to set values
of some predefined set of options. Once the driver registers, it can read
those and act accordingly. For example, we need that to set "profile" of
our asic. This is a substitute to module options which are completely
inappropriate for this usecase.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 0/9] Introduce devlink interface and first drivers to use it
  2016-02-23  7:32   ` Jiri Pirko
@ 2016-02-23 14:34     ` Andy Gospodarek
  2016-02-23 14:45       ` Jiri Pirko
  0 siblings, 1 reply; 42+ messages in thread
From: Andy Gospodarek @ 2016-02-23 14:34 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On Tue, Feb 23, 2016 at 08:32:03AM +0100, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 06:12:15AM CET, gospo@cumulusnetworks.com wrote:
> >On Mon, Feb 22, 2016 at 07:31:55PM +0100, Jiri Pirko wrote:
> >> From: Jiri Pirko <jiri@mellanox.com>
> >> 
> >> There a is need for some userspace API that would allow to expose things
> >> that are not directly related to any device class like net_device of
> >> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> >> 
> >> Use cases:
> >> 1) get/set of port type (Ethernet/InfiniBand)
> >> 2) setting up port splitters - split port into multiple ones and squash again,
> >>    enables usage of splitter cable
> >> 3) setting up shared buffers - shared among multiple ports within
> >>    one chip (work in progress)
> >> 4) configuration of switch wide properties - resources division etc - This will
> >>    allow to pass configuration that is unacceptable to be passed as
> >>    a module option.
> >
> >I'm generally a fan of use cases #3 and #4 (as we have previously
> >discussed), but I'm not sure I agree that the implementation for #2
> >right now.
> >
> >I'm not sure I would like userspace to have control over whether or not
> >a port should be split or not when the hardware can be queried to
> >determine this.
> 
> I was thinking about this myself for a long time and initially it made
> more sense to me the approach you are suggesting. But after lot of
> thining, I saw a lot of issues, I changed my mind and now I believe that
> splitter should be set by user.

It seems like now we have convinced each other to change our initial
positions.

As you recall I was previously in favor of handling mapping of
switch-ports to physical ports in netlink and performing breakout
configuration that way as well.  Having Mellanox hardware perform in the
way you describe would certainly not prevent a solution to come later
that did automatic detection and reconfiguration, so if the concensus is
that handling breakouts in this way is OK, you are not going to see
significant argument from me.  (I checked back with a few others on what
is done in the closed-source world and some manual intervention is often
needed there as well, so let's stick with what you have).
 
> when you set the splitter port, you have 2 or 4 ports created in
> hardware. They looks exactly the same as unsplitted ports, they are only
> wired up differently. There is a netdev created for every splitter port,
> user would add it to bridge, bond, vlan, set routes, define static fdb
> entry and hw would learn fdbs itself. Now when someone disconnects the
> splitter cable, 2 things may happen:
> 1) yourway. Netdevs disappear, all configs and state info will disappear
>    with it.
> 2) myway. Netdevs stay, only link is down. This is the same behaviour is
>    if someone uplugs cable for unplitted port.
> 
> There are also similar issues before you plug the splitter cable. So it
> makes sense to let user co configure this. He knows what he wants. 1)
> does not look like correct behaviour to me, 2) does.

The bottom line here is that for the switch offload case most chips need
to be reconfigured significantly to function in a way that makes moving
from 1x100GbE > 2x50GbE, so whether one use case is better or worse
doesn't really matter that much.
> 
> 
> 
> <snip>
> 
> >> 	myhost:~$ dl port show
> >> 	devlink0/1: type eth netdev ens4
> >                             ^^^^^^^^^^^
> >> 	devlink0/2: type ib ibdev mlx4_0
> >                            ^^^^^^^^^^^^
> >I think my only other question about this implementation is whether or
> >not one would really want to have the true netdev/ibdev names mapped
> >here.
> >
> >Would be as reasonable to simply specify the type (and there may be more
> >types within ethernet that could be useful in multi-chip configurations)
> >and then let normal infrastructure that exists today figure out how to
> >map the names for the netdevs to the devices?
> 
> What normal infrastructure you have in mind? There is no info about
> devlink port mapping to netdev/ibdev anywhere. Only here. I might be
> missing something but I fail to see what's wrong with it.

I was simply wondering out loud if we _really_ wanted to name netdevs
this way.  I was suggesting that output could be like this:

myhost:~$ dl port show
devlink0/1: type eth
devlink0/2: type ib 

mnd that udev/systemd/biosdevname/etc would take care of naming the
device whataever it wanted.  This appears to be essentially the same
concern Hannes has.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 0/9] Introduce devlink interface and first drivers to use it
  2016-02-23 14:34     ` Andy Gospodarek
@ 2016-02-23 14:45       ` Jiri Pirko
  2016-02-23 15:55         ` Andy Gospodarek
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 14:45 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 03:34:19PM CET, gospo@cumulusnetworks.com wrote:

<snip>
>> 
>> >> 	myhost:~$ dl port show
>> >> 	devlink0/1: type eth netdev ens4
>> >                             ^^^^^^^^^^^
>> >> 	devlink0/2: type ib ibdev mlx4_0
>> >                            ^^^^^^^^^^^^
>> >I think my only other question about this implementation is whether or
>> >not one would really want to have the true netdev/ibdev names mapped
>> >here.
>> >
>> >Would be as reasonable to simply specify the type (and there may be more
>> >types within ethernet that could be useful in multi-chip configurations)
>> >and then let normal infrastructure that exists today figure out how to
>> >map the names for the netdevs to the devices?
>> 
>> What normal infrastructure you have in mind? There is no info about
>> devlink port mapping to netdev/ibdev anywhere. Only here. I might be
>> missing something but I fail to see what's wrong with it.
>
>I was simply wondering out loud if we _really_ wanted to name netdevs
>this way.  I was suggesting that output could be like this:
>
>myhost:~$ dl port show
>devlink0/1: type eth
>devlink0/2: type ib 
>
>mnd that udev/systemd/biosdevname/etc would take care of naming the
>device whataever it wanted.  This appears to be essentially the same
>concern Hannes has.

Wait. The only thing which will be renamed by udev is "devlink0". The
suffixes "/1" and "/2" are direct indexes as used inside the driver.

And you need some link to netdev in case netdev exists - therefore
"netdev ens4" attribute is there. There's no other way to get the
mapping of "devlink0/1" to "ens4" anywhere else.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 14:26         ` Jiri Pirko
@ 2016-02-23 15:16           ` Hannes Frederic Sowa
  2016-02-23 15:30             ` Jiri Pirko
  2016-02-23 15:20           ` Andy Gospodarek
  1 sibling, 1 reply; 42+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-23 15:16 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On 23.02.2016 15:26, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 02:28:05PM CET, hannes@stressinduktion.org wrote:
>> On 23.02.2016 13:21, Jiri Pirko wrote:
>>> Tue, Feb 23, 2016 at 12:26:00PM CET, hannes@stressinduktion.org wrote:
>>>> Hi Jiri,
>>>>
>>>> On 22.02.2016 19:31, Jiri Pirko wrote:
>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>
>>>>> So far, there has been an mlx4-specific sysfs file allowing user to
>>>>> change port type to either Ethernet of InfiniBand. This is very
>>>>> inconvenient.
>>>>
>>>> Again, I want to express my concerns regarding all of this until this will be
>>>> integrated into udev/systemd for stable device names. While one can build
>>>> wrapper code around devlink to have stable devlink ports, I don't see a
>>>> reason to include kernel code which actually has more problems than the sysfs
>>>> approach. This harms admins to use those devices and will additionally
>>>> require user space to write boiler plate code.
>>>
>>> Sysfs is not the place to do this things. It was already discussed here
>>> multiple times. There was and attempt to use configfs, which was also
>>> refused. Netlink is the only place to go. For multiple reasons,
>>> including well defined api and behaviour, notifications, etc.
>>
>> I am not against netlink at all. My fear with this interface is simply:
>>
>> 1) we introduce another ifindex/name like identifiers. It took a long time
>> until this stuff finally worked fine with linux. It needs persistent storage
>> in userspace being applied at boot time. Why this complications for this
>> probably lesser often used interface?
>
> Lesser often where? On switches, this interface will be used all the
> time. You have to have some handle to manipulate the chip-wide stuff. In
> our case it is devlink0. Similar to wireless, they have phy0. I believe
> it is completely legit.

Lesser often as you e.g. refer to the interface name in nftables or 
netfilter, or in setsockopt etc. They are not being referenced as often 
as interface names, so the question is: do they need nice looking names?

>> 2) The actual devlink attributes get managed from inside devlink and not the
>> driver. So driver need to modify devlink.c/devlink.h in core to add new
>> attributes.
>
> That is exactly the point! Vendors cannot add their own specific crap,
> they have to do things in generic way and extend devlink iface
> accordingly. That's what we do now with ASIC shared buffer configuration
> via devlink for example (in addition to port type and splitter).

If this is part of the design, okay.

>> 1) is easily solvable, just drop the ifindex style attributes and always
>> force the user to enter the bus and bus-topology id.
>
> But why? Use can easily get that info and map it to devlink index. It
> aligns with nl80211 iface.
>
> Do you really want to do commands like:
> myhost:~$ dl dev show pci_0000:01:00.0
 > ?

Yes, exactly I would. I would put them into a boot-up script based on my 
system configuration and can be sure it will work the next boot, too, 
and adapt them when I replace the hardware or do some configuration changes.

I think sysadmins or scripts are the primary users of this interface not 
kernel developers which switch their settings around all the time, no?

>> For 2) I don't really know what drivers want, not sure if it is easier to add
>> some small helper functions to add sysfs attributes to kobjects without
>> necessarily holding a net_device. Thus mellanox drivers can use it and I am
>> not sure how many other networking cards allow switching ports between ib and
>> eth type. Port splitting only happens for interfaces which already have a
>> net_device, no?
>
> Not necessarily. IB ports that has no net_device could be split as well.
> Hannes, again, sysfs approach was refused couple of times in past for this
> purpose. Please leave sysfs alone.

Sorry, I couldn't find the references or the reasons.

Actually the sysfs knob is in the kernel right now.

>>> I think it is quite trivial to teach udev to name devlinkX devices
>>> according to pci address (or any other address). That's all what is
>>> needed here. I don't understand your concerns.
>>
>> I don't think that this interface needs the same complexity as network
>> interfaces.
>
> Again, it aligns nicely with what they to in wireless in nl80211
> interface. I don't see any complexity.

The interface names must be kept stable from user space.

Sorry to be such a pedantic ass*** here, but isn't nl80211 the other way 
around? You have an interface as an anchor and can use that to discover 
the other interfaces using the same phy?
I have no experience here how those get managed by wpa_supplicant, but 
at least as a user, you specific interfaces and not phys.

I look more into this and how they deal with that, thanks.

>> I am not sure, but one of the initial problems was that this information
>> should already be there before the driver actually gets loaded, no? These
>> changes don't solve this problem either?
>
> This is planned to be implemented in near future. Basically there would
> be possible to use DEVLINK_CMD_NEW to add devlink iface for specific device
> even before the driver gets loaded to serve as a place holder to set values
> of some predefined set of options. Once the driver registers, it can read
> those and act accordingly. For example, we need that to set "profile" of
> our asic. This is a substitute to module options which are completely
> inappropriate for this usecase.

Okay, interesting.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 14:26         ` Jiri Pirko
  2016-02-23 15:16           ` Hannes Frederic Sowa
@ 2016-02-23 15:20           ` Andy Gospodarek
  2016-02-23 15:31             ` Jiri Pirko
  1 sibling, 1 reply; 42+ messages in thread
From: Andy Gospodarek @ 2016-02-23 15:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Hannes Frederic Sowa, netdev, davem, idosch, eladr, yotamg,
	ogerlitz, yishaih, dledford, sean.hefty, hal.rosenstock, eugenia,
	roopa, nikolay, hadarh, jhs, john.fastabend, jeffrey.t.kirsher,
	brouer, ivecera, rami.rosen

On Tue, Feb 23, 2016 at 03:26:27PM +0100, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 02:28:05PM CET, hannes@stressinduktion.org wrote:
> >On 23.02.2016 13:21, Jiri Pirko wrote:
> >>Tue, Feb 23, 2016 at 12:26:00PM CET, hannes@stressinduktion.org wrote:
> >>>Hi Jiri,
> >>>
> >>>On 22.02.2016 19:31, Jiri Pirko wrote:
> >>>>From: Jiri Pirko <jiri@mellanox.com>
> >>>>
> >>>>So far, there has been an mlx4-specific sysfs file allowing user to
> >>>>change port type to either Ethernet of InfiniBand. This is very
> >>>>inconvenient.
> >>>
> >>>Again, I want to express my concerns regarding all of this until this will be
> >>>integrated into udev/systemd for stable device names. While one can build
> >>>wrapper code around devlink to have stable devlink ports, I don't see a
> >>>reason to include kernel code which actually has more problems than the sysfs
> >>>approach. This harms admins to use those devices and will additionally
> >>>require user space to write boiler plate code.
> >>
> >>Sysfs is not the place to do this things. It was already discussed here
> >>multiple times. There was and attempt to use configfs, which was also
> >>refused. Netlink is the only place to go. For multiple reasons,
> >>including well defined api and behaviour, notifications, etc.
> >
> >I am not against netlink at all. My fear with this interface is simply:
> >
> >1) we introduce another ifindex/name like identifiers. It took a long time
> >until this stuff finally worked fine with linux. It needs persistent storage
> >in userspace being applied at boot time. Why this complications for this
> >probably lesser often used interface?
> 
> Lesser often where? On switches, this interface will be used all the
> time. You have to have some handle to manipulate the chip-wide stuff. In
> our case it is devlink0. Similar to wireless, they have phy0. I believe
> it is completely legit.
> 
> 
> >
> >2) The actual devlink attributes get managed from inside devlink and not the
> >driver. So driver need to modify devlink.c/devlink.h in core to add new
> >attributes.
> 
> That is exactly the point! Vendors cannot add their own specific crap,
> they have to do things in generic way and extend devlink iface
> accordingly. That's what we do now with ASIC shared buffer configuration
> via devlink for example (in addition to port type and splitter).
> 
> 
> >
> >1) is easily solvable, just drop the ifindex style attributes and always
> >force the user to enter the bus and bus-topology id.
> 
> But why? Use can easily get that info and map it to devlink index. It
> aligns with nl80211 iface.
> 
> Do you really want to do commands like:
> myhost:~$ dl dev show pci_0000:01:00.0
> ?
> 
> 
> >
> >For 2) I don't really know what drivers want, not sure if it is easier to add
> >some small helper functions to add sysfs attributes to kobjects without
> >necessarily holding a net_device. Thus mellanox drivers can use it and I am
> >not sure how many other networking cards allow switching ports between ib and
> >eth type. Port splitting only happens for interfaces which already have a
> >net_device, no?
> 
> Not necessarily. IB ports that has no net_device could be split as well.
> Hannes, again, sysfs approach was refused couple of times in past for this
> purpose. Please leave sysfs alone.
> 
> 
> >
> >>I think it is quite trivial to teach udev to name devlinkX devices
> >>according to pci address (or any other address). That's all what is
> >>needed here. I don't understand your concerns.
> >
> >I don't think that this interface needs the same complexity as network
> >interfaces.
> 
> Again, it aligns nicely with what they to in wireless in nl80211
> interface. I don't see any complexity.
> 
> 
> >
> >I am not sure, but one of the initial problems was that this information
> >should already be there before the driver actually gets loaded, no? These
> >changes don't solve this problem either?
> 
> This is planned to be implemented in near future. Basically there would
> be possible to use DEVLINK_CMD_NEW to add devlink iface for specific device
> even before the driver gets loaded to serve as a place holder to set values
s/driver/network driver/ right?

> of some predefined set of options. Once the driver registers, it can read
> those and act accordingly. For example, we need that to set "profile" of
> our asic. This is a substitute to module options which are completely
> inappropriate for this usecase.

FWIW, I DO like the idea that the PCI driver contains this information
and netdev creation in the network driver depends on this mapping.  We
see these issues on a regular basis and while have solved it other
ways (rtnl_link_ops and genl which is why I like a cross-vendor way to
do it like this).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 15:16           ` Hannes Frederic Sowa
@ 2016-02-23 15:30             ` Jiri Pirko
  2016-02-23 15:57               ` Hannes Frederic Sowa
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 15:30 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 04:16:11PM CET, hannes@stressinduktion.org wrote:

<snip>

>>>1) is easily solvable, just drop the ifindex style attributes and always
>>>force the user to enter the bus and bus-topology id.
>>
>>But why? Use can easily get that info and map it to devlink index. It
>>aligns with nl80211 iface.
>>
>>Do you really want to do commands like:
>>myhost:~$ dl dev show pci_0000:01:00.0
>> ?
>
>Yes, exactly I would. I would put them into a boot-up script based on my
>system configuration and can be sure it will work the next boot, too, and
>adapt them when I replace the hardware or do some configuration changes.
>
>I think sysadmins or scripts are the primary users of this interface not
>kernel developers which switch their settings around all the time, no?

I can easily add this to the userspace tool to accept "pci_0000:01:00.0"
format and to map it internally to devlink index. No problem.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 15:20           ` Andy Gospodarek
@ 2016-02-23 15:31             ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 15:31 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Hannes Frederic Sowa, netdev, davem, idosch, eladr, yotamg,
	ogerlitz, yishaih, dledford, sean.hefty, hal.rosenstock, eugenia,
	roopa, nikolay, hadarh, jhs, john.fastabend, jeffrey.t.kirsher,
	brouer, ivecera, rami.rosen

Tue, Feb 23, 2016 at 04:20:09PM CET, gospo@cumulusnetworks.com wrote:

<snip>

>> >
>> >I am not sure, but one of the initial problems was that this information
>> >should already be there before the driver actually gets loaded, no? These
>> >changes don't solve this problem either?
>> 
>> This is planned to be implemented in near future. Basically there would
>> be possible to use DEVLINK_CMD_NEW to add devlink iface for specific device
>> even before the driver gets loaded to serve as a place holder to set values
>s/driver/network driver/ right?

right.

>
>> of some predefined set of options. Once the driver registers, it can read
>> those and act accordingly. For example, we need that to set "profile" of
>> our asic. This is a substitute to module options which are completely
>> inappropriate for this usecase.
>
>FWIW, I DO like the idea that the PCI driver contains this information
>and netdev creation in the network driver depends on this mapping.  We
>see these issues on a regular basis and while have solved it other
>ways (rtnl_link_ops and genl which is why I like a cross-vendor way to
>do it like this).
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 0/9] Introduce devlink interface and first drivers to use it
  2016-02-23 14:45       ` Jiri Pirko
@ 2016-02-23 15:55         ` Andy Gospodarek
  2016-02-23 16:01           ` Jiri Pirko
  0 siblings, 1 reply; 42+ messages in thread
From: Andy Gospodarek @ 2016-02-23 15:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On Tue, Feb 23, 2016 at 03:45:51PM +0100, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 03:34:19PM CET, gospo@cumulusnetworks.com wrote:
> 
> <snip>
> >> 
> >> >> 	myhost:~$ dl port show
> >> >> 	devlink0/1: type eth netdev ens4
> >> >                             ^^^^^^^^^^^
> >> >> 	devlink0/2: type ib ibdev mlx4_0
> >> >                            ^^^^^^^^^^^^
> >> >I think my only other question about this implementation is whether or
> >> >not one would really want to have the true netdev/ibdev names mapped
> >> >here.
> >> >
> >> >Would be as reasonable to simply specify the type (and there may be more
> >> >types within ethernet that could be useful in multi-chip configurations)
> >> >and then let normal infrastructure that exists today figure out how to
> >> >map the names for the netdevs to the devices?
> >> 
> >> What normal infrastructure you have in mind? There is no info about
> >> devlink port mapping to netdev/ibdev anywhere. Only here. I might be
> >> missing something but I fail to see what's wrong with it.
> >
> >I was simply wondering out loud if we _really_ wanted to name netdevs
> >this way.  I was suggesting that output could be like this:
> >
> >myhost:~$ dl port show
> >devlink0/1: type eth
> >devlink0/2: type ib 
> >
> >mnd that udev/systemd/biosdevname/etc would take care of naming the
> >device whataever it wanted.  This appears to be essentially the same
> >concern Hannes has.
> 
> Wait. The only thing which will be renamed by udev is "devlink0". The
> suffixes "/1" and "/2" are direct indexes as used inside the driver.
> 
> And you need some link to netdev in case netdev exists - therefore
> "netdev ens4" attribute is there. There's no other way to get the
> mapping of "devlink0/1" to "ens4" anywhere else.

So I think I had invisioned a slightly different workflow than what you
just described.

- Load PCI driver
- Setup devlink attributes for your hardware
- Create netdevs in network driver based on those attributes

You don't need a netdev to reference any of the devlink specific
parameters do you?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 15:30             ` Jiri Pirko
@ 2016-02-23 15:57               ` Hannes Frederic Sowa
  2016-02-23 16:04                 ` Jiri Pirko
  0 siblings, 1 reply; 42+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-23 15:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On 23.02.2016 16:30, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 04:16:11PM CET, hannes@stressinduktion.org wrote:
>
> <snip>
>
>>>> 1) is easily solvable, just drop the ifindex style attributes and always
>>>> force the user to enter the bus and bus-topology id.
>>>
>>> But why? Use can easily get that info and map it to devlink index. It
>>> aligns with nl80211 iface.
>>>
>>> Do you really want to do commands like:
>>> myhost:~$ dl dev show pci_0000:01:00.0
>>> ?
>>
>> Yes, exactly I would. I would put them into a boot-up script based on my
>> system configuration and can be sure it will work the next boot, too, and
>> adapt them when I replace the hardware or do some configuration changes.
>>
>> I think sysadmins or scripts are the primary users of this interface not
>> kernel developers which switch their settings around all the time, no?
>
> I can easily add this to the userspace tool to accept "pci_0000:01:00.0"
> format and to map it internally to devlink index. No problem.

I argue for this stable topology identifier to be the default. 
Especially if you add device info before the actual module is loaded 
(this is during initramfs, when udev cannot rename devlink names to 
stable ones), a user has to deal with pre-devlink-ids before rename and 
after. Do you have plans how to address that?

Current initramfs for stable interface names uses EUI48 based mac 
addresses most of the time and udev runs then after the pivot_root.

The devlink names can easily be aliases in user space.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 0/9] Introduce devlink interface and first drivers to use it
  2016-02-23 15:55         ` Andy Gospodarek
@ 2016-02-23 16:01           ` Jiri Pirko
  2016-02-23 16:19             ` Andy Gospodarek
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 16:01 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 04:55:28PM CET, gospo@cumulusnetworks.com wrote:
>On Tue, Feb 23, 2016 at 03:45:51PM +0100, Jiri Pirko wrote:
>> Tue, Feb 23, 2016 at 03:34:19PM CET, gospo@cumulusnetworks.com wrote:
>> 
>> <snip>
>> >> 
>> >> >> 	myhost:~$ dl port show
>> >> >> 	devlink0/1: type eth netdev ens4
>> >> >                             ^^^^^^^^^^^
>> >> >> 	devlink0/2: type ib ibdev mlx4_0
>> >> >                            ^^^^^^^^^^^^
>> >> >I think my only other question about this implementation is whether or
>> >> >not one would really want to have the true netdev/ibdev names mapped
>> >> >here.
>> >> >
>> >> >Would be as reasonable to simply specify the type (and there may be more
>> >> >types within ethernet that could be useful in multi-chip configurations)
>> >> >and then let normal infrastructure that exists today figure out how to
>> >> >map the names for the netdevs to the devices?
>> >> 
>> >> What normal infrastructure you have in mind? There is no info about
>> >> devlink port mapping to netdev/ibdev anywhere. Only here. I might be
>> >> missing something but I fail to see what's wrong with it.
>> >
>> >I was simply wondering out loud if we _really_ wanted to name netdevs
>> >this way.  I was suggesting that output could be like this:
>> >
>> >myhost:~$ dl port show
>> >devlink0/1: type eth
>> >devlink0/2: type ib 
>> >
>> >mnd that udev/systemd/biosdevname/etc would take care of naming the
>> >device whataever it wanted.  This appears to be essentially the same
>> >concern Hannes has.
>> 
>> Wait. The only thing which will be renamed by udev is "devlink0". The
>> suffixes "/1" and "/2" are direct indexes as used inside the driver.
>> 
>> And you need some link to netdev in case netdev exists - therefore
>> "netdev ens4" attribute is there. There's no other way to get the
>> mapping of "devlink0/1" to "ens4" anywhere else.
>
>So I think I had invisioned a slightly different workflow than what you
>just described.
>
>- Load PCI driver
>- Setup devlink attributes for your hardware
>- Create netdevs in network driver based on those attributes
>
>You don't need a netdev to reference any of the devlink specific
>parameters do you?

No, I have a devlink handle and a devlink port index. That is enough.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 15:57               ` Hannes Frederic Sowa
@ 2016-02-23 16:04                 ` Jiri Pirko
  2016-02-23 16:45                   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 16:04 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 04:57:17PM CET, hannes@stressinduktion.org wrote:
>On 23.02.2016 16:30, Jiri Pirko wrote:
>>Tue, Feb 23, 2016 at 04:16:11PM CET, hannes@stressinduktion.org wrote:
>>
>><snip>
>>
>>>>>1) is easily solvable, just drop the ifindex style attributes and always
>>>>>force the user to enter the bus and bus-topology id.
>>>>
>>>>But why? Use can easily get that info and map it to devlink index. It
>>>>aligns with nl80211 iface.
>>>>
>>>>Do you really want to do commands like:
>>>>myhost:~$ dl dev show pci_0000:01:00.0
>>>>?
>>>
>>>Yes, exactly I would. I would put them into a boot-up script based on my
>>>system configuration and can be sure it will work the next boot, too, and
>>>adapt them when I replace the hardware or do some configuration changes.
>>>
>>>I think sysadmins or scripts are the primary users of this interface not
>>>kernel developers which switch their settings around all the time, no?
>>
>>I can easily add this to the userspace tool to accept "pci_0000:01:00.0"
>>format and to map it internally to devlink index. No problem.
>
>I argue for this stable topology identifier to be the default. Especially if
>you add device info before the actual module is loaded (this is during
>initramfs, when udev cannot rename devlink names to stable ones), a user has
>to deal with pre-devlink-ids before rename and after. Do you have plans how
>to address that?

You can still access devlink using pci_addr using dl. I don't see a
problem.


>
>Current initramfs for stable interface names uses EUI48 based mac addresses
>most of the time and udev runs then after the pivot_root.
>
>The devlink names can easily be aliases in user space.

I don't want to store them anywhere. I just use "dl" tool and pass the
name there.


>
>Bye,
>Hannes
>
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 0/9] Introduce devlink interface and first drivers to use it
  2016-02-23 16:01           ` Jiri Pirko
@ 2016-02-23 16:19             ` Andy Gospodarek
  2016-02-23 16:23               ` Jiri Pirko
  0 siblings, 1 reply; 42+ messages in thread
From: Andy Gospodarek @ 2016-02-23 16:19 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On Tue, Feb 23, 2016 at 05:01:31PM +0100, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 04:55:28PM CET, gospo@cumulusnetworks.com wrote:
> >On Tue, Feb 23, 2016 at 03:45:51PM +0100, Jiri Pirko wrote:
> >> Tue, Feb 23, 2016 at 03:34:19PM CET, gospo@cumulusnetworks.com wrote:
> >> 
> >> <snip>
> >> >> 
> >> >> >> 	myhost:~$ dl port show
> >> >> >> 	devlink0/1: type eth netdev ens4
> >> >> >                             ^^^^^^^^^^^
> >> >> >> 	devlink0/2: type ib ibdev mlx4_0
> >> >> >                            ^^^^^^^^^^^^
> >> >> >I think my only other question about this implementation is whether or
> >> >> >not one would really want to have the true netdev/ibdev names mapped
> >> >> >here.
> >> >> >
> >> >> >Would be as reasonable to simply specify the type (and there may be more
> >> >> >types within ethernet that could be useful in multi-chip configurations)
> >> >> >and then let normal infrastructure that exists today figure out how to
> >> >> >map the names for the netdevs to the devices?
> >> >> 
> >> >> What normal infrastructure you have in mind? There is no info about
> >> >> devlink port mapping to netdev/ibdev anywhere. Only here. I might be
> >> >> missing something but I fail to see what's wrong with it.
> >> >
> >> >I was simply wondering out loud if we _really_ wanted to name netdevs
> >> >this way.  I was suggesting that output could be like this:
> >> >
> >> >myhost:~$ dl port show
> >> >devlink0/1: type eth
> >> >devlink0/2: type ib 
> >> >
> >> >mnd that udev/systemd/biosdevname/etc would take care of naming the
> >> >device whataever it wanted.  This appears to be essentially the same
> >> >concern Hannes has.
> >> 
> >> Wait. The only thing which will be renamed by udev is "devlink0". The
> >> suffixes "/1" and "/2" are direct indexes as used inside the driver.
> >> 
> >> And you need some link to netdev in case netdev exists - therefore
> >> "netdev ens4" attribute is there. There's no other way to get the
> >> mapping of "devlink0/1" to "ens4" anywhere else.
> >
> >So I think I had invisioned a slightly different workflow than what you
> >just described.
> >
> >- Load PCI driver
> >- Setup devlink attributes for your hardware
> >- Create netdevs in network driver based on those attributes
> >
> >You don't need a netdev to reference any of the devlink specific
> >parameters do you?
> 
> No, I have a devlink handle and a devlink port index. That is enough.

That's what I read as well and why I wondered why you said this:

> >> And you need some link to netdev in case netdev exists - therefore
> >> "netdev ens4" attribute is there. There's no other way to get the
> >> mapping of "devlink0/1" to "ens4" anywhere else.

It just doesn't seem like referencing the netdev when performing set
operations was needed.

I see no issue having it appear a dump action after the fact, but
it seems like devlink[unit number]/[port number] would be enough to
reference the proper hardware mapping.  

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 0/9] Introduce devlink interface and first drivers to use it
  2016-02-23 16:19             ` Andy Gospodarek
@ 2016-02-23 16:23               ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 16:23 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 05:19:37PM CET, gospo@cumulusnetworks.com wrote:
>On Tue, Feb 23, 2016 at 05:01:31PM +0100, Jiri Pirko wrote:
>> Tue, Feb 23, 2016 at 04:55:28PM CET, gospo@cumulusnetworks.com wrote:
>> >On Tue, Feb 23, 2016 at 03:45:51PM +0100, Jiri Pirko wrote:
>> >> Tue, Feb 23, 2016 at 03:34:19PM CET, gospo@cumulusnetworks.com wrote:
>> >> 
>> >> <snip>
>> >> >> 
>> >> >> >> 	myhost:~$ dl port show
>> >> >> >> 	devlink0/1: type eth netdev ens4
>> >> >> >                             ^^^^^^^^^^^
>> >> >> >> 	devlink0/2: type ib ibdev mlx4_0
>> >> >> >                            ^^^^^^^^^^^^
>> >> >> >I think my only other question about this implementation is whether or
>> >> >> >not one would really want to have the true netdev/ibdev names mapped
>> >> >> >here.
>> >> >> >
>> >> >> >Would be as reasonable to simply specify the type (and there may be more
>> >> >> >types within ethernet that could be useful in multi-chip configurations)
>> >> >> >and then let normal infrastructure that exists today figure out how to
>> >> >> >map the names for the netdevs to the devices?
>> >> >> 
>> >> >> What normal infrastructure you have in mind? There is no info about
>> >> >> devlink port mapping to netdev/ibdev anywhere. Only here. I might be
>> >> >> missing something but I fail to see what's wrong with it.
>> >> >
>> >> >I was simply wondering out loud if we _really_ wanted to name netdevs
>> >> >this way.  I was suggesting that output could be like this:
>> >> >
>> >> >myhost:~$ dl port show
>> >> >devlink0/1: type eth
>> >> >devlink0/2: type ib 
>> >> >
>> >> >mnd that udev/systemd/biosdevname/etc would take care of naming the
>> >> >device whataever it wanted.  This appears to be essentially the same
>> >> >concern Hannes has.
>> >> 
>> >> Wait. The only thing which will be renamed by udev is "devlink0". The
>> >> suffixes "/1" and "/2" are direct indexes as used inside the driver.
>> >> 
>> >> And you need some link to netdev in case netdev exists - therefore
>> >> "netdev ens4" attribute is there. There's no other way to get the
>> >> mapping of "devlink0/1" to "ens4" anywhere else.
>> >
>> >So I think I had invisioned a slightly different workflow than what you
>> >just described.
>> >
>> >- Load PCI driver
>> >- Setup devlink attributes for your hardware
>> >- Create netdevs in network driver based on those attributes
>> >
>> >You don't need a netdev to reference any of the devlink specific
>> >parameters do you?
>> 
>> No, I have a devlink handle and a devlink port index. That is enough.
>
>That's what I read as well and why I wondered why you said this:
>
>> >> And you need some link to netdev in case netdev exists - therefore
>> >> "netdev ens4" attribute is there. There's no other way to get the
>> >> mapping of "devlink0/1" to "ens4" anywhere else.
>
>It just doesn't seem like referencing the netdev when performing set
>operations was needed.
>
>I see no issue having it appear a dump action after the fact, but
>it seems like devlink[unit number]/[port number] would be enough to
>reference the proper hardware mapping.  

I said nothing about needing netdev for set operation Andy. It is there
in a get list as an attribute for your convenience so you know what
netdev that specific port relates to. That's it.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 16:04                 ` Jiri Pirko
@ 2016-02-23 16:45                   ` Hannes Frederic Sowa
  2016-02-23 16:55                     ` Jiri Pirko
  0 siblings, 1 reply; 42+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-23 16:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On 23.02.2016 17:04, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 04:57:17PM CET, hannes@stressinduktion.org wrote:
>> On 23.02.2016 16:30, Jiri Pirko wrote:
>>> Tue, Feb 23, 2016 at 04:16:11PM CET, hannes@stressinduktion.org wrote:
>>>
>>> <snip>
>>>
>>>>>> 1) is easily solvable, just drop the ifindex style attributes and always
>>>>>> force the user to enter the bus and bus-topology id.
>>>>>
>>>>> But why? Use can easily get that info and map it to devlink index. It
>>>>> aligns with nl80211 iface.
>>>>>
>>>>> Do you really want to do commands like:
>>>>> myhost:~$ dl dev show pci_0000:01:00.0
>>>>> ?
>>>>
>>>> Yes, exactly I would. I would put them into a boot-up script based on my
>>>> system configuration and can be sure it will work the next boot, too, and
>>>> adapt them when I replace the hardware or do some configuration changes.
>>>>
>>>> I think sysadmins or scripts are the primary users of this interface not
>>>> kernel developers which switch their settings around all the time, no?
>>>
>>> I can easily add this to the userspace tool to accept "pci_0000:01:00.0"
>>> format and to map it internally to devlink index. No problem.
>>
>> I argue for this stable topology identifier to be the default. Especially if
>> you add device info before the actual module is loaded (this is during
>> initramfs, when udev cannot rename devlink names to stable ones), a user has
>> to deal with pre-devlink-ids before rename and after. Do you have plans how
>> to address that?
>
> You can still access devlink using pci_addr using dl. I don't see a
> problem.

I don't really see a reason why the devlink indexes/names exist inside 
the kernel instead of a stable topology identifier. They confuse users 
and add more unnecessary code to the kernel. Shells have environment 
variables for that. ;) This is a low-level kernel setting tool IMHO.

I just see the problem that users use the devlink* names and we get 
reports because stuff breaks because they don't use the stable 
identifiers. That is all.

>> Current initramfs for stable interface names uses EUI48 based mac addresses
>> most of the time and udev runs then after the pivot_root.
>>
>> The devlink names can easily be aliases in user space.
>
> I don't want to store them anywhere. I just use "dl" tool and pass the
> name there.

Sorry?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 16:45                   ` Hannes Frederic Sowa
@ 2016-02-23 16:55                     ` Jiri Pirko
  2016-02-23 17:07                       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 42+ messages in thread
From: Jiri Pirko @ 2016-02-23 16:55 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

Tue, Feb 23, 2016 at 05:45:07PM CET, hannes@stressinduktion.org wrote:
>On 23.02.2016 17:04, Jiri Pirko wrote:
>>Tue, Feb 23, 2016 at 04:57:17PM CET, hannes@stressinduktion.org wrote:
>>>On 23.02.2016 16:30, Jiri Pirko wrote:
>>>>Tue, Feb 23, 2016 at 04:16:11PM CET, hannes@stressinduktion.org wrote:
>>>>
>>>><snip>
>>>>
>>>>>>>1) is easily solvable, just drop the ifindex style attributes and always
>>>>>>>force the user to enter the bus and bus-topology id.
>>>>>>
>>>>>>But why? Use can easily get that info and map it to devlink index. It
>>>>>>aligns with nl80211 iface.
>>>>>>
>>>>>>Do you really want to do commands like:
>>>>>>myhost:~$ dl dev show pci_0000:01:00.0
>>>>>>?
>>>>>
>>>>>Yes, exactly I would. I would put them into a boot-up script based on my
>>>>>system configuration and can be sure it will work the next boot, too, and
>>>>>adapt them when I replace the hardware or do some configuration changes.
>>>>>
>>>>>I think sysadmins or scripts are the primary users of this interface not
>>>>>kernel developers which switch their settings around all the time, no?
>>>>
>>>>I can easily add this to the userspace tool to accept "pci_0000:01:00.0"
>>>>format and to map it internally to devlink index. No problem.
>>>
>>>I argue for this stable topology identifier to be the default. Especially if
>>>you add device info before the actual module is loaded (this is during
>>>initramfs, when udev cannot rename devlink names to stable ones), a user has
>>>to deal with pre-devlink-ids before rename and after. Do you have plans how
>>>to address that?
>>
>>You can still access devlink using pci_addr using dl. I don't see a
>>problem.
>
>I don't really see a reason why the devlink indexes/names exist inside the
>kernel instead of a stable topology identifier. They confuse users and add
>more unnecessary code to the kernel. Shells have environment variables for
>that. ;) This is a low-level kernel setting tool IMHO.
>
>I just see the problem that users use the devlink* names and we get reports
>because stuff breaks because they don't use the stable identifiers. That is
>all.
>

you can have stable devlink name using udev. For pre-udev usage, you can
use pci address directly. I don't like to use pci address for every
dl command. I would like to have some more convenient name handle -
devlink name.


>>>Current initramfs for stable interface names uses EUI48 based mac addresses
>>>most of the time and udev runs then after the pivot_root.
>>>
>>>The devlink names can easily be aliases in user space.
>>
>>I don't want to store them anywhere. I just use "dl" tool and pass the
>>name there.
>
>Sorry?

you said - "The devlink names can easily be aliases in user space." -
where do you want to store them?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 16:55                     ` Jiri Pirko
@ 2016-02-23 17:07                       ` Hannes Frederic Sowa
  0 siblings, 0 replies; 42+ messages in thread
From: Hannes Frederic Sowa @ 2016-02-23 17:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

On 23.02.2016 17:55, Jiri Pirko wrote:
> Tue, Feb 23, 2016 at 05:45:07PM CET, hannes@stressinduktion.org wrote:
>> On 23.02.2016 17:04, Jiri Pirko wrote:
>>> Tue, Feb 23, 2016 at 04:57:17PM CET, hannes@stressinduktion.org wrote:
>>>> On 23.02.2016 16:30, Jiri Pirko wrote:
>>>>> Tue, Feb 23, 2016 at 04:16:11PM CET, hannes@stressinduktion.org wrote:
>>>>>
>>>>> <snip>
>>>>>
>>>>>>>> 1) is easily solvable, just drop the ifindex style attributes and always
>>>>>>>> force the user to enter the bus and bus-topology id.
>>>>>>>
>>>>>>> But why? Use can easily get that info and map it to devlink index. It
>>>>>>> aligns with nl80211 iface.
>>>>>>>
>>>>>>> Do you really want to do commands like:
>>>>>>> myhost:~$ dl dev show pci_0000:01:00.0
>>>>>>> ?
>>>>>>
>>>>>> Yes, exactly I would. I would put them into a boot-up script based on my
>>>>>> system configuration and can be sure it will work the next boot, too, and
>>>>>> adapt them when I replace the hardware or do some configuration changes.
>>>>>>
>>>>>> I think sysadmins or scripts are the primary users of this interface not
>>>>>> kernel developers which switch their settings around all the time, no?
>>>>>
>>>>> I can easily add this to the userspace tool to accept "pci_0000:01:00.0"
>>>>> format and to map it internally to devlink index. No problem.
>>>>
>>>> I argue for this stable topology identifier to be the default. Especially if
>>>> you add device info before the actual module is loaded (this is during
>>>> initramfs, when udev cannot rename devlink names to stable ones), a user has
>>>> to deal with pre-devlink-ids before rename and after. Do you have plans how
>>>> to address that?
>>>
>>> You can still access devlink using pci_addr using dl. I don't see a
>>> problem.
>>
>> I don't really see a reason why the devlink indexes/names exist inside the
>> kernel instead of a stable topology identifier. They confuse users and add
>> more unnecessary code to the kernel. Shells have environment variables for
>> that. ;) This is a low-level kernel setting tool IMHO.
>>
>> I just see the problem that users use the devlink* names and we get reports
>> because stuff breaks because they don't use the stable identifiers. That is
>> all.
>>
>
> you can have stable devlink name using udev. For pre-udev usage, you can
> use pci address directly. I don't like to use pci address for every
> dl command. I would like to have some more convenient name handle -
> devlink name.

Okay then - I was not sure if this low-level kernel command line tool 
deserves changes to the whole booting infrastructure to make it safe to use.

>>>> Current initramfs for stable interface names uses EUI48 based mac addresses
>>>> most of the time and udev runs then after the pivot_root.
>>>>
>>>> The devlink names can easily be aliases in user space.
>>>
>>> I don't want to store them anywhere. I just use "dl" tool and pass the
>>> name there.
>>
>> Sorry?
>
> you said - "The devlink names can easily be aliases in user space." -
> where do you want to store them?

You can easily store them in your shell environment or add a feature to 
the dl command to read aliases from a file (will be much more simple 
than walking all devlinks and checking for a match if you implement it 
the reverse way).

Note, if you want to have kernel provided names, they need to be stored 
in user space anyway because of udev renaming and the persisting database.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 11:26   ` Hannes Frederic Sowa
  2016-02-23 12:21     ` Jiri Pirko
@ 2016-02-23 17:31     ` Stephen Hemminger
  2016-02-24  7:15       ` Jiri Pirko
  1 sibling, 1 reply; 42+ messages in thread
From: Stephen Hemminger @ 2016-02-23 17:31 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, ogerlitz,
	yishaih, dledford, sean.hefty, hal.rosenstock, eugenia, roopa,
	nikolay, hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer,
	ivecera, rami.rosen

On Tue, 23 Feb 2016 12:26:00 +0100
Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

> Hi Jiri,
> 
> On 22.02.2016 19:31, Jiri Pirko wrote:
> > From: Jiri Pirko <jiri@mellanox.com>
> >
> > So far, there has been an mlx4-specific sysfs file allowing user to
> > change port type to either Ethernet of InfiniBand. This is very
> > inconvenient.
> 
> Again, I want to express my concerns regarding all of this until this 
> will be integrated into udev/systemd for stable device names. While one 
> can build wrapper code around devlink to have stable devlink ports, I 
> don't see a reason to include kernel code which actually has more 
> problems than the sysfs approach. This harms admins to use those devices 
> and will additionally require user space to write boiler plate code.
> 
> Thanks,
> Hannes
> 

I appreciate that you need to have a lighterweight model for
network devices. But have to agree with Hannes.

This code breaks the model expected by applications like Quagga, SNMP
and lots of other legacy code. Is this really going to work with the
legacy Linux model.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 12:21     ` Jiri Pirko
  2016-02-23 13:28       ` Hannes Frederic Sowa
@ 2016-02-23 18:16       ` David Miller
  1 sibling, 0 replies; 42+ messages in thread
From: David Miller @ 2016-02-23 18:16 UTC (permalink / raw)
  To: jiri
  Cc: hannes, netdev, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer, ivecera,
	rami.rosen

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 23 Feb 2016 13:21:10 +0100

> Sysfs is not the place to do this things.

+1

> It was already discussed here multiple times. There was and attempt
> to use configfs, which was also refused. Netlink is the only place
> to go. For multiple reasons, including well defined api and
> behaviour, notifications, etc.

+1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [patch net-next 1/9] Introduce devlink infrastructure
  2016-02-22 18:31 ` [patch net-next 1/9] Introduce devlink infrastructure Jiri Pirko
  2016-02-22 22:29   ` roopa
@ 2016-02-24  7:02   ` Yuval Mintz
  2016-02-24 10:56     ` Jiri Pirko
  1 sibling, 1 reply; 42+ messages in thread
From: Yuval Mintz @ 2016-02-24  7:02 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: David Miller, idosch, eladr, yotamg, ogerlitz, yishaih, dledford,
	sean.hefty, hal.rosenstock, eugenia, roopa, nikolay, hadarh, jhs,
	john.fastabend, jeffrey.t.kirsher, brouer, ivecera,

> + * An overall lock guarding every operation comming from userspace.
> + * If also guards devlink devices list and it is taken when
> + * driver registers/unregisters it.
Several typos in comment.

> +static void devlink_notify(struct devlink *devlink, enum
> +devlink_command cmd) {
...
> +	WARN_ON(cmd != DEVLINK_CMD_NEW && cmd !=
> DEVLINK_CMD_DEL);
Given this should never happen, shouldn't this be ONCE?

> +static void devlink_port_notify(struct devlink_port *devlink_port,
...
> +	WARN_ON(cmd != DEVLINK_CMD_PORT_NEW && cmd !=
> DEVLINK_CMD_PORT_DEL);
Likewise

> +static void __devlink_port_type_set(struct devlink_port *devlink_port,
...
> +	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); }
Why is this PORT_NEW? Shouldn't it be PORT_SET?
Also, curly bracers are repeatedly on last line of function [if this file].
Is this by design?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 3/9] mlx4: Implement port type setting via devlink interface
  2016-02-23 17:31     ` Stephen Hemminger
@ 2016-02-24  7:15       ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-24  7:15 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Hannes Frederic Sowa, netdev, davem, idosch, eladr, yotamg,
	ogerlitz, yishaih, dledford, sean.hefty, hal.rosenstock, eugenia,
	roopa, nikolay, hadarh, jhs, john.fastabend, jeffrey.t.kirsher,
	brouer, ivecera, rami.rosen

Tue, Feb 23, 2016 at 06:31:39PM CET, stephen@networkplumber.org wrote:
>On Tue, 23 Feb 2016 12:26:00 +0100
>Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>
>> Hi Jiri,
>> 
>> On 22.02.2016 19:31, Jiri Pirko wrote:
>> > From: Jiri Pirko <jiri@mellanox.com>
>> >
>> > So far, there has been an mlx4-specific sysfs file allowing user to
>> > change port type to either Ethernet of InfiniBand. This is very
>> > inconvenient.
>> 
>> Again, I want to express my concerns regarding all of this until this 
>> will be integrated into udev/systemd for stable device names. While one 
>> can build wrapper code around devlink to have stable devlink ports, I 
>> don't see a reason to include kernel code which actually has more 
>> problems than the sysfs approach. This harms admins to use those devices 
>> and will additionally require user space to write boiler plate code.
>> 
>> Thanks,
>> Hannes
>> 
>
>I appreciate that you need to have a lighterweight model for
>network devices. But have to agree with Hannes.

No, I don't need to have lighterweight model for network device. This
patch does nothing like that.


>
>This code breaks the model expected by applications like Quagga, SNMP
>and lots of other legacy code. Is this really going to work with the
>legacy Linux model.

No, this patch does not break anything. The original netdev still stay.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch net-next 1/9] Introduce devlink infrastructure
  2016-02-24  7:02   ` Yuval Mintz
@ 2016-02-24 10:56     ` Jiri Pirko
  0 siblings, 0 replies; 42+ messages in thread
From: Jiri Pirko @ 2016-02-24 10:56 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: netdev, David Miller, idosch, eladr, yotamg, ogerlitz, yishaih,
	dledford, sean.hefty, hal.rosenstock, eugenia, roopa, nikolay,
	hadarh, jhs, john.fastabend, jeffrey.t.kirsher, brouer,
	ivecera@redhat.com

Wed, Feb 24, 2016 at 08:02:32AM CET, Yuval.Mintz@qlogic.com wrote:
>> + * An overall lock guarding every operation comming from userspace.
>> + * If also guards devlink devices list and it is taken when
>> + * driver registers/unregisters it.
>Several typos in comment.

Already fixed in v2.

>
>> +static void devlink_notify(struct devlink *devlink, enum
>> +devlink_command cmd) {
>...
>> +	WARN_ON(cmd != DEVLINK_CMD_NEW && cmd !=
>> DEVLINK_CMD_DEL);
>Given this should never happen, shouldn't this be ONCE?

Given this should never happen and only warns the developer fiddling
with this, I don't think it matters.


>
>> +static void devlink_port_notify(struct devlink_port *devlink_port,
>...
>> +	WARN_ON(cmd != DEVLINK_CMD_PORT_NEW && cmd !=
>> DEVLINK_CMD_PORT_DEL);
>Likewise
>
>> +static void __devlink_port_type_set(struct devlink_port *devlink_port,
>...
>> +	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW); }
>Why is this PORT_NEW? Shouldn't it be PORT_SET?
>Also, curly bracers are repeatedly on last line of function [if this file].
>Is this by design?

SET is only for userspace->kernel messages. NEW is for reporting events
back. This is consistend with rest of the netlink messages out there,
including rtnl and nl80211

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-02-24 10:56 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22 18:31 [patch net-next 0/9] Introduce devlink interface and first drivers to use it Jiri Pirko
2016-02-22 18:31 ` [patch net-next 1/9] Introduce devlink infrastructure Jiri Pirko
2016-02-22 22:29   ` roopa
2016-02-23  7:47     ` Jiri Pirko
2016-02-24  7:02   ` Yuval Mintz
2016-02-24 10:56     ` Jiri Pirko
2016-02-22 18:31 ` [patch net-next 2/9] mlx4: Implement devlink interface Jiri Pirko
2016-02-22 18:31 ` [patch net-next 3/9] mlx4: Implement port type setting via " Jiri Pirko
2016-02-23 11:26   ` Hannes Frederic Sowa
2016-02-23 12:21     ` Jiri Pirko
2016-02-23 13:28       ` Hannes Frederic Sowa
2016-02-23 14:26         ` Jiri Pirko
2016-02-23 15:16           ` Hannes Frederic Sowa
2016-02-23 15:30             ` Jiri Pirko
2016-02-23 15:57               ` Hannes Frederic Sowa
2016-02-23 16:04                 ` Jiri Pirko
2016-02-23 16:45                   ` Hannes Frederic Sowa
2016-02-23 16:55                     ` Jiri Pirko
2016-02-23 17:07                       ` Hannes Frederic Sowa
2016-02-23 15:20           ` Andy Gospodarek
2016-02-23 15:31             ` Jiri Pirko
2016-02-23 18:16       ` David Miller
2016-02-23 17:31     ` Stephen Hemminger
2016-02-24  7:15       ` Jiri Pirko
2016-02-22 18:31 ` [patch net-next 4/9] mlxsw: Implement " Jiri Pirko
2016-02-22 18:32 ` [patch net-next 5/9] mlxsw: core: Add devlink port splitter callbacks Jiri Pirko
2016-02-22 18:32 ` [patch net-next 6/9] mlxsw: spectrum: Unmap local port from module during teardown Jiri Pirko
2016-02-22 20:32   ` John Fastabend
2016-02-22 21:00     ` Ido Schimmel
2016-02-22 21:08       ` John Fastabend
2016-02-23  4:50     ` Andy Gospodarek
2016-02-22 18:32 ` [patch net-next 7/9] mlxsw: spectrum: Store local port to module mapping during init Jiri Pirko
2016-02-22 18:32 ` [patch net-next 8/9] mlxsw: spectrum: Mark unused ports using NULL Jiri Pirko
2016-02-22 18:32 ` [patch net-next 9/9] mlxsw: spectrum: Introduce port splitting Jiri Pirko
2016-02-23  5:12 ` [patch net-next 0/9] Introduce devlink interface and first drivers to use it Andy Gospodarek
2016-02-23  7:32   ` Jiri Pirko
2016-02-23 14:34     ` Andy Gospodarek
2016-02-23 14:45       ` Jiri Pirko
2016-02-23 15:55         ` Andy Gospodarek
2016-02-23 16:01           ` Jiri Pirko
2016-02-23 16:19             ` Andy Gospodarek
2016-02-23 16:23               ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.