netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch net-next RFC 00/10] introduce line card support for modular switch
@ 2021-01-13 12:12 Jiri Pirko
  2021-01-13 12:12 ` [patch net-next RFC 01/10] devlink: add support to create line card and expose to user Jiri Pirko
                   ` (14 more replies)
  0 siblings, 15 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

This patchset introduces support for modular switch systems.
NVIDIA Mellanox SN4800 is an example of such. It contains 8 slots
to accomodate line cards. Available line cards include:
16X 100GbE (QSFP28)
8X 200GbE (QSFP56)
4X 400GbE (QSFP-DD)

Similar to split cabels, it is essencial for the correctness of
configuration and funcionality to treat the line card entities
in the same way, no matter the line card is inserted or not.
Meaning, the netdevice of a line card port cannot just disappear
when line card is removed. Also, system admin needs to be able
to apply configuration on netdevices belonging to line card port
even before the linecard gets inserted.

To resolve this, a concept of "provisioning" is introduced.
The user may "provision" certain slot with a line card type.
Driver then creates all instances (devlink ports, netdevices, etc)
related to this line card type. The carrier of netdevices stays down.
Once the line card is inserted and activated, the carrier of the
related netdevices goes up.

Once user does not want to use the line card related instances
anymore, he can "unprovision" the slot. Driver then removes the
instances.

Patches 1-5 are extending devlink driver API and UAPI in order to
register, show, dump, provision and activate the line card.
Patches 6-9 are implementing the introduced API in netdevsim

Example:

# Create a new netdevsim device, with no ports and 2 line cards:
$ echo "10 0 2" >/sys/bus/netdevsim/new_device
$ devlink port # No ports are listed
$ devlink lc
netdevsim/netdevsim10:
  lc 0 state unprovisioned
    supported_types:
       card1port card2ports card4ports
  lc 1 state unprovisioned
    supported_types:
       card1port card2ports card4ports

# Note that driver advertizes supported line card types. In case of
# netdevsim, these are 3.

$ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
$ devlink lc
netdevsim/netdevsim10:
  lc 0 state provisioned type card4ports
    supported_types:
       card1port card2ports card4ports
  lc 1 state unprovisioned
    supported_types:
       card1port card2ports card4ports
$ devlink port
netdevsim/netdevsim10/1000: type eth netdev eni10nl0p1 flavour physical lc 0 port 1 splittable false
netdevsim/netdevsim10/1001: type eth netdev eni10nl0p2 flavour physical lc 0 port 2 splittable false
netdevsim/netdevsim10/1002: type eth netdev eni10nl0p3 flavour physical lc 0 port 3 splittable false
netdevsim/netdevsim10/1003: type eth netdev eni10nl0p4 flavour physical lc 0 port 4 splittable false
#                                                 ^^                    ^^^^
#                                     netdev name adjusted          index of a line card this port belongs to

$ ip link set eni10nl0p1 up 
$ ip link show eni10nl0p1   
165: eni10nl0p1: <NO-CARRIER,BROADCAST,NOARP,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff

# Now activate the line card using debugfs. That emulates plug-in event
# on real hardware:
$ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
$ ip link show eni10nl0p1
165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
# The carrier is UP now.

Jiri Pirko (10):
  devlink: add support to create line card and expose to user
  devlink: implement line card provisioning
  devlink: implement line card active state
  devlink: append split port number to the port name
  devlink: add port to line card relationship set
  netdevsim: introduce line card support
  netdevsim: allow port objects to be linked with line cards
  netdevsim: create devlink line card object and implement provisioning
  netdevsim: implement line card activation
  selftests: add netdevsim devlink lc test

 drivers/net/netdevsim/bus.c                   |  21 +-
 drivers/net/netdevsim/dev.c                   | 370 ++++++++++++++-
 drivers/net/netdevsim/netdev.c                |   2 +
 drivers/net/netdevsim/netdevsim.h             |  23 +
 include/net/devlink.h                         |  44 ++
 include/uapi/linux/devlink.h                  |  25 +
 net/core/devlink.c                            | 443 +++++++++++++++++-
 .../drivers/net/netdevsim/devlink.sh          |  62 ++-
 8 files changed, 964 insertions(+), 26 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [patch net-next RFC 01/10] devlink: add support to create line card and expose to user
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-15 15:47   ` Ido Schimmel
  2021-01-13 12:12 ` [patch net-next RFC 02/10] devlink: implement line card provisioning Jiri Pirko
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

Extend the devlink API so the driver is going to be able to create and
destroy linecard instances. There can be multiple line cards per devlink
device. Expose this new type of object over devlink netlink API to the
userspace, with notifications.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 include/net/devlink.h        |  10 ++
 include/uapi/linux/devlink.h |   7 ++
 net/core/devlink.c           | 227 ++++++++++++++++++++++++++++++++++-
 3 files changed, 243 insertions(+), 1 deletion(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index dd0c0b8fba6e..67c2547d5ef9 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -45,6 +45,7 @@ struct devlink {
 	struct list_head trap_list;
 	struct list_head trap_group_list;
 	struct list_head trap_policer_list;
+	struct list_head linecard_list;
 	const struct devlink_ops *ops;
 	struct xarray snapshot_ids;
 	struct devlink_dev_stats stats;
@@ -138,6 +139,12 @@ struct devlink_port {
 	struct mutex reporters_lock; /* Protects reporter_list */
 };
 
+struct devlink_linecard {
+	struct list_head list;
+	struct devlink *devlink;
+	unsigned int index;
+};
+
 struct devlink_sb_pool_info {
 	enum devlink_sb_pool_type pool_type;
 	u32 size;
@@ -1407,6 +1414,9 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
 				   u16 pf, bool external);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external);
+struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
+						 unsigned int linecard_index);
+void devlink_linecard_destroy(struct devlink_linecard *linecard);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cf89c318f2ac..e5ed0522591f 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -126,6 +126,11 @@ enum devlink_command {
 
 	DEVLINK_CMD_HEALTH_REPORTER_TEST,
 
+	DEVLINK_CMD_LINECARD_GET,		/* can dump */
+	DEVLINK_CMD_LINECARD_SET,
+	DEVLINK_CMD_LINECARD_NEW,
+	DEVLINK_CMD_LINECARD_DEL,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -529,6 +534,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_RELOAD_ACTION_INFO,        /* nested */
 	DEVLINK_ATTR_RELOAD_ACTION_STATS,       /* nested */
 
+	DEVLINK_ATTR_LINECARD_INDEX,		/* u32 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index f86688bfad46..564e921133cf 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -187,6 +187,46 @@ static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink,
 	return devlink_port_get_from_attrs(devlink, info->attrs);
 }
 
+static struct devlink_linecard *
+devlink_linecard_get_by_index(struct devlink *devlink,
+			      unsigned int linecard_index)
+{
+	struct devlink_linecard *devlink_linecard;
+
+	list_for_each_entry(devlink_linecard, &devlink->linecard_list, list) {
+		if (devlink_linecard->index == linecard_index)
+			return devlink_linecard;
+	}
+	return NULL;
+}
+
+static bool devlink_linecard_index_exists(struct devlink *devlink,
+					  unsigned int linecard_index)
+{
+	return devlink_linecard_get_by_index(devlink, linecard_index);
+}
+
+static struct devlink_linecard *
+devlink_linecard_get_from_attrs(struct devlink *devlink, struct nlattr **attrs)
+{
+	if (attrs[DEVLINK_ATTR_LINECARD_INDEX]) {
+		u32 linecard_index = nla_get_u32(attrs[DEVLINK_ATTR_LINECARD_INDEX]);
+		struct devlink_linecard *linecard;
+
+		linecard = devlink_linecard_get_by_index(devlink, linecard_index);
+		if (!linecard)
+			return ERR_PTR(-ENODEV);
+		return linecard;
+	}
+	return ERR_PTR(-EINVAL);
+}
+
+static struct devlink_linecard *
+devlink_linecard_get_from_info(struct devlink *devlink, struct genl_info *info)
+{
+	return devlink_linecard_get_from_attrs(devlink, info->attrs);
+}
+
 struct devlink_sb {
 	struct list_head list;
 	unsigned int index;
@@ -405,16 +445,18 @@ devlink_region_snapshot_get_by_id(struct devlink_region *region, u32 id)
 
 #define DEVLINK_NL_FLAG_NEED_PORT		BIT(0)
 #define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT	BIT(1)
+#define DEVLINK_NL_FLAG_NEED_LINECARD		BIT(2)
 
 /* The per devlink instance lock is taken by default in the pre-doit
  * operation, yet several commands do not require this. The global
  * devlink lock is taken and protects from disruption by user-calls.
  */
-#define DEVLINK_NL_FLAG_NO_LOCK			BIT(2)
+#define DEVLINK_NL_FLAG_NO_LOCK			BIT(3)
 
 static int devlink_nl_pre_doit(const struct genl_ops *ops,
 			       struct sk_buff *skb, struct genl_info *info)
 {
+	struct devlink_linecard *linecard;
 	struct devlink_port *devlink_port;
 	struct devlink *devlink;
 	int err;
@@ -439,6 +481,13 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
 		devlink_port = devlink_port_get_from_info(devlink, info);
 		if (!IS_ERR(devlink_port))
 			info->user_ptr[1] = devlink_port;
+	} else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_LINECARD) {
+		linecard = devlink_linecard_get_from_info(devlink, info);
+		if (IS_ERR(linecard)) {
+			err = PTR_ERR(linecard);
+			goto unlock;
+		}
+		info->user_ptr[1] = linecard;
 	}
 	return 0;
 
@@ -1136,6 +1185,121 @@ static int devlink_nl_cmd_port_unsplit_doit(struct sk_buff *skb,
 	return devlink_port_unsplit(devlink, port_index, info->extack);
 }
 
+static int devlink_nl_linecard_fill(struct sk_buff *msg,
+				    struct devlink *devlink,
+				    struct devlink_linecard *linecard,
+				    enum devlink_command cmd, u32 portid,
+				    u32 seq, int flags,
+				    struct netlink_ext_ack *extack)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (devlink_nl_put_handle(msg, devlink))
+		goto nla_put_failure;
+	if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index))
+		goto nla_put_failure;
+
+	genlmsg_end(msg, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static void devlink_linecard_notify(struct devlink_linecard *linecard,
+				    enum devlink_command cmd)
+{
+	struct devlink *devlink = linecard->devlink;
+	struct sk_buff *msg;
+	int err;
+
+	WARN_ON(cmd != DEVLINK_CMD_LINECARD_NEW &&
+		cmd != DEVLINK_CMD_LINECARD_DEL);
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return;
+
+	err = devlink_nl_linecard_fill(msg, devlink, linecard, cmd, 0, 0, 0,
+				       NULL);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+}
+
+static int devlink_nl_cmd_linecard_get_doit(struct sk_buff *skb,
+					    struct genl_info *info)
+{
+	struct devlink_linecard *linecard = info->user_ptr[1];
+	struct devlink *devlink = linecard->devlink;
+	struct sk_buff *msg;
+	int err;
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_linecard_fill(msg, devlink, linecard,
+				       DEVLINK_CMD_LINECARD_NEW,
+				       info->snd_portid, info->snd_seq, 0,
+				       info->extack);
+	if (err) {
+		nlmsg_free(msg);
+		return err;
+	}
+
+	return genlmsg_reply(msg, info);
+}
+
+static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg,
+					      struct netlink_callback *cb)
+{
+	struct devlink_linecard *linecard;
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		mutex_lock(&devlink->lock);
+		list_for_each_entry(linecard, &devlink->linecard_list, list) {
+			if (idx < start) {
+				idx++;
+				continue;
+			}
+			err = devlink_nl_linecard_fill(msg, devlink, linecard,
+						       DEVLINK_CMD_LINECARD_NEW,
+						       NETLINK_CB(cb->skb).portid,
+						       cb->nlh->nlmsg_seq,
+						       NLM_F_MULTI,
+						       cb->extack);
+			if (err) {
+				mutex_unlock(&devlink->lock);
+				goto out;
+			}
+			idx++;
+		}
+		mutex_unlock(&devlink->lock);
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
 static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink,
 			      struct devlink_sb *devlink_sb,
 			      enum devlink_command cmd, u32 portid,
@@ -7594,6 +7758,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_RELOAD_ACTION] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_RELOAD_ACTION_DRIVER_REINIT,
 							DEVLINK_RELOAD_ACTION_MAX),
 	[DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK),
+	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
 };
 
 static const struct genl_small_ops devlink_nl_ops[] = {
@@ -7633,6 +7798,14 @@ static const struct genl_small_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NO_LOCK,
 	},
+	{
+		.cmd = DEVLINK_CMD_LINECARD_GET,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit = devlink_nl_cmd_linecard_get_doit,
+		.dumpit = devlink_nl_cmd_linecard_get_dumpit,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
+		/* can be retrieved by unprivileged users */
+	},
 	{
 		.cmd = DEVLINK_CMD_SB_GET,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
@@ -7982,6 +8155,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC);
 	__devlink_net_set(devlink, &init_net);
 	INIT_LIST_HEAD(&devlink->port_list);
+	INIT_LIST_HEAD(&devlink->linecard_list);
 	INIT_LIST_HEAD(&devlink->sb_list);
 	INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list);
 	INIT_LIST_HEAD(&devlink->resource_list);
@@ -8084,6 +8258,7 @@ void devlink_free(struct devlink *devlink)
 	WARN_ON(!list_empty(&devlink->resource_list));
 	WARN_ON(!list_empty(&devlink->dpipe_table_list));
 	WARN_ON(!list_empty(&devlink->sb_list));
+	WARN_ON(!list_empty(&devlink->linecard_list));
 	WARN_ON(!list_empty(&devlink->port_list));
 
 	xa_destroy(&devlink->snapshot_ids);
@@ -8428,6 +8603,56 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 	return 0;
 }
 
+/**
+ *	devlink_linecard_register - Register devlink linecard
+ *
+ *	@devlink: devlink
+ *	@devlink_linecard: devlink linecard
+ *	@linecard_index: driver-specific numerical identifier of the linecard
+ *
+ *	Create devlink linecard instance with provided linecard index.
+ *	Caller can use any indexing, even hw-related one.
+ */
+struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
+						 unsigned int linecard_index)
+{
+	struct devlink_linecard *linecard;
+
+	mutex_lock(&devlink->lock);
+	if (devlink_linecard_index_exists(devlink, linecard_index)) {
+		mutex_unlock(&devlink->lock);
+		return ERR_PTR(-EEXIST);
+	}
+
+	linecard = kzalloc(sizeof(*linecard), GFP_KERNEL);
+	if (!linecard)
+		return ERR_PTR(-ENOMEM);
+
+	linecard->devlink = devlink;
+	linecard->index = linecard_index;
+	list_add_tail(&linecard->list, &devlink->linecard_list);
+	mutex_unlock(&devlink->lock);
+	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
+	return linecard;
+}
+EXPORT_SYMBOL_GPL(devlink_linecard_create);
+
+/**
+ *	devlink_linecard_destroy - Destroy devlink linecard
+ *
+ *	@devlink_linecard: devlink linecard
+ */
+void devlink_linecard_destroy(struct devlink_linecard *linecard)
+{
+	struct devlink *devlink = linecard->devlink;
+
+	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL);
+	mutex_lock(&devlink->lock);
+	list_del(&linecard->list);
+	mutex_unlock(&devlink->lock);
+}
+EXPORT_SYMBOL_GPL(devlink_linecard_create);
+
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 02/10] devlink: implement line card provisioning
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
  2021-01-13 12:12 ` [patch net-next RFC 01/10] devlink: add support to create line card and expose to user Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-15 16:03   ` Ido Schimmel
  2021-01-13 12:12 ` [patch net-next RFC 03/10] devlink: implement line card active state Jiri Pirko
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

In order to be able to configure all needed stuff on a port/netdevice
of a line card without the line card being present, introduce line card
provisioning. Basically provisioning will create a placeholder for
instances (ports/netdevices) for a line card type.

Allow the user to query the supported line card types over line card
get command. Then implement two netlink commands to allow user to
provision/unprovision the line card with selected line card type.

On the driver API side, add provision/unprovision ops and supported
types array to be advertised. Upon provision op call, the driver should
take care of creating the instances for the particular line card type.
Introduce provision_set/clear() functions to be called by the driver
once the provisioning/unprovisioning is done on its side.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 include/net/devlink.h        |  31 +++++++-
 include/uapi/linux/devlink.h |  17 +++++
 net/core/devlink.c           | 141 ++++++++++++++++++++++++++++++++++-
 3 files changed, 185 insertions(+), 4 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 67c2547d5ef9..854abd53e9ea 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -139,10 +139,33 @@ struct devlink_port {
 	struct mutex reporters_lock; /* Protects reporter_list */
 };
 
+struct devlink_linecard_ops;
+
 struct devlink_linecard {
 	struct list_head list;
 	struct devlink *devlink;
 	unsigned int index;
+	const struct devlink_linecard_ops *ops;
+	void *priv;
+	enum devlink_linecard_state state;
+	const char *provisioned_type;
+};
+
+/**
+ * struct devlink_linecard_ops - Linecard operations
+ * @supported_types: array of supported types of linecards
+ * @supported_types_count: number of elements in the array above
+ * @provision: callback to provision the linecard slot with certain
+ *	       type of linecard
+ * @unprovision: callback to unprovision the linecard slot
+ */
+struct devlink_linecard_ops {
+	const char **supported_types;
+	unsigned int supported_types_count;
+	int (*provision)(struct devlink_linecard *linecard, void *priv,
+			 u32 type_index, struct netlink_ext_ack *extack);
+	int (*unprovision)(struct devlink_linecard *linecard, void *priv,
+			   struct netlink_ext_ack *extack);
 };
 
 struct devlink_sb_pool_info {
@@ -1414,9 +1437,13 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
 				   u16 pf, bool external);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external);
-struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
-						 unsigned int linecard_index);
+struct devlink_linecard *
+devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
+			const struct devlink_linecard_ops *ops, void *priv);
 void devlink_linecard_destroy(struct devlink_linecard *linecard);
+void devlink_linecard_provision_set(struct devlink_linecard *linecard,
+				    u32 type_index);
+void devlink_linecard_provision_clear(struct devlink_linecard *linecard);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index e5ed0522591f..4111ddcc000b 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -131,6 +131,9 @@ enum devlink_command {
 	DEVLINK_CMD_LINECARD_NEW,
 	DEVLINK_CMD_LINECARD_DEL,
 
+	DEVLINK_CMD_LINECARD_PROVISION,
+	DEVLINK_CMD_LINECARD_UNPROVISION,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -329,6 +332,17 @@ enum devlink_reload_limit {
 
 #define DEVLINK_RELOAD_LIMITS_VALID_MASK (_BITUL(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
 
+enum devlink_linecard_state {
+	DEVLINK_LINECARD_STATE_UNSPEC,
+	DEVLINK_LINECARD_STATE_UNPROVISIONED,
+	DEVLINK_LINECARD_STATE_UNPROVISIONING,
+	DEVLINK_LINECARD_STATE_PROVISIONING,
+	DEVLINK_LINECARD_STATE_PROVISIONED,
+
+	__DEVLINK_LINECARD_STATE_MAX,
+	DEVLINK_LINECARD_STATE_MAX = __DEVLINK_LINECARD_STATE_MAX - 1
+};
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -535,6 +549,9 @@ enum devlink_attr {
 	DEVLINK_ATTR_RELOAD_ACTION_STATS,       /* nested */
 
 	DEVLINK_ATTR_LINECARD_INDEX,		/* u32 */
+	DEVLINK_ATTR_LINECARD_STATE,		/* u8 */
+	DEVLINK_ATTR_LINECARD_TYPE,		/* string */
+	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 564e921133cf..434eecc310c3 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1192,7 +1192,9 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
 				    u32 seq, int flags,
 				    struct netlink_ext_ack *extack)
 {
+	struct nlattr *attr;
 	void *hdr;
+	int i;
 
 	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
 	if (!hdr)
@@ -1202,6 +1204,22 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
 		goto nla_put_failure;
 	if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index))
 		goto nla_put_failure;
+	if (nla_put_u8(msg, DEVLINK_ATTR_LINECARD_STATE, linecard->state))
+		goto nla_put_failure;
+	if (linecard->state >= DEVLINK_LINECARD_STATE_PROVISIONED &&
+	    nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE,
+			   linecard->provisioned_type))
+		goto nla_put_failure;
+
+	attr = nla_nest_start(msg, DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES);
+	if (!attr)
+		return -EMSGSIZE;
+	for (i = 0; i < linecard->ops->supported_types_count; i++) {
+		if (nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE,
+				   linecard->ops->supported_types[i]))
+			goto nla_put_failure;
+	}
+	nla_nest_end(msg, attr);
 
 	genlmsg_end(msg, hdr);
 	return 0;
@@ -1300,6 +1318,68 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg,
 	return msg->len;
 }
 
+static int devlink_nl_cmd_linecard_provision_doit(struct sk_buff *skb,
+						  struct genl_info *info)
+{
+	struct devlink_linecard *linecard = info->user_ptr[1];
+	const char *type;
+	int i;
+
+	if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being provisioned");
+		return -EBUSY;
+	}
+	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being unprovisioned");
+		return -EBUSY;
+	}
+	if (linecard->state != DEVLINK_LINECARD_STATE_UNPROVISIONED) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Linecard already provisioned");
+		return -EBUSY;
+	}
+
+	if (!info->attrs[DEVLINK_ATTR_LINECARD_TYPE]) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Provision type not provided");
+		return -EINVAL;
+	}
+
+	type = nla_data(info->attrs[DEVLINK_ATTR_LINECARD_TYPE]);
+	for (i = 0; i < linecard->ops->supported_types_count; i++) {
+		if (!strcmp(linecard->ops->supported_types[i], type)) {
+			linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING;
+			devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
+			return linecard->ops->provision(linecard,
+							linecard->priv, i,
+							info->extack);
+		}
+	}
+	NL_SET_ERR_MSG_MOD(info->extack, "Unsupported provision type provided");
+	return -EINVAL;
+}
+
+static int devlink_nl_cmd_linecard_unprovision_doit(struct sk_buff *skb,
+						    struct genl_info *info)
+{
+	struct devlink_linecard *linecard = info->user_ptr[1];
+
+	if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being provisioned");
+		return -EBUSY;
+	}
+	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being unprovisioned");
+		return -EBUSY;
+	}
+	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONED) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is not provisioned");
+		return -EOPNOTSUPP;
+	}
+	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONING;
+	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
+	return linecard->ops->unprovision(linecard, linecard->priv,
+					  info->extack);
+}
+
 static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink,
 			      struct devlink_sb *devlink_sb,
 			      enum devlink_command cmd, u32 portid,
@@ -7759,6 +7839,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 							DEVLINK_RELOAD_ACTION_MAX),
 	[DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK),
 	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
 };
 
 static const struct genl_small_ops devlink_nl_ops[] = {
@@ -7806,6 +7887,20 @@ static const struct genl_small_ops devlink_nl_ops[] = {
 		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
 		/* can be retrieved by unprivileged users */
 	},
+	{
+		.cmd = DEVLINK_CMD_LINECARD_PROVISION,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit = devlink_nl_cmd_linecard_provision_doit,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
+	},
+	{
+		.cmd = DEVLINK_CMD_LINECARD_UNPROVISION,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit = devlink_nl_cmd_linecard_unprovision_doit,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
+	},
 	{
 		.cmd = DEVLINK_CMD_SB_GET,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
@@ -8613,11 +8708,17 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
  *	Create devlink linecard instance with provided linecard index.
  *	Caller can use any indexing, even hw-related one.
  */
-struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
-						 unsigned int linecard_index)
+struct devlink_linecard *
+devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
+			const struct devlink_linecard_ops *ops, void *priv)
 {
 	struct devlink_linecard *linecard;
 
+	if (WARN_ON(!ops || !ops->supported_types ||
+		    !ops->supported_types_count ||
+		    !ops->provision || !ops->unprovision))
+		return ERR_PTR(-EINVAL);
+
 	mutex_lock(&devlink->lock);
 	if (devlink_linecard_index_exists(devlink, linecard_index)) {
 		mutex_unlock(&devlink->lock);
@@ -8630,6 +8731,9 @@ struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
 
 	linecard->devlink = devlink;
 	linecard->index = linecard_index;
+	linecard->ops = ops;
+	linecard->priv = priv;
+	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED;
 	list_add_tail(&linecard->list, &devlink->linecard_list);
 	mutex_unlock(&devlink->lock);
 	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
@@ -8653,6 +8757,39 @@ void devlink_linecard_destroy(struct devlink_linecard *linecard)
 }
 EXPORT_SYMBOL_GPL(devlink_linecard_create);
 
+/**
+ *	devlink_linecard_provision_set - Set provisioning on linecard
+ *
+ *	@devlink_linecard: devlink linecard
+ *	@type_index: index of the linecard type (in array of types in ops)
+ */
+void devlink_linecard_provision_set(struct devlink_linecard *linecard,
+				    u32 type_index)
+{
+	WARN_ON(type_index >= linecard->ops->supported_types_count);
+	mutex_lock(&linecard->devlink->lock);
+	linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED;
+	linecard->provisioned_type = linecard->ops->supported_types[type_index];
+	mutex_unlock(&linecard->devlink->lock);
+	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
+}
+EXPORT_SYMBOL_GPL(devlink_linecard_provision_set);
+
+/**
+ *	devlink_linecard_provision_clear - Clear provisioning on linecard
+ *
+ *	@devlink_linecard: devlink linecard
+ */
+void devlink_linecard_provision_clear(struct devlink_linecard *linecard)
+{
+	mutex_lock(&linecard->devlink->lock);
+	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED;
+	linecard->provisioned_type = NULL;
+	mutex_unlock(&linecard->devlink->lock);
+	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
+}
+EXPORT_SYMBOL_GPL(devlink_linecard_provision_clear);
+
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 03/10] devlink: implement line card active state
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
  2021-01-13 12:12 ` [patch net-next RFC 01/10] devlink: add support to create line card and expose to user Jiri Pirko
  2021-01-13 12:12 ` [patch net-next RFC 02/10] devlink: implement line card provisioning Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-15 16:06   ` Ido Schimmel
  2021-01-13 12:12 ` [patch net-next RFC 04/10] devlink: append split port number to the port name Jiri Pirko
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

Allow driver to mark a lin ecards as active. Expose this state to the
userspace over devlink netlink interface with proper notifications.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 include/net/devlink.h        |  4 ++++
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 46 ++++++++++++++++++++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 854abd53e9ea..ec00cd94c626 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -149,6 +149,7 @@ struct devlink_linecard {
 	void *priv;
 	enum devlink_linecard_state state;
 	const char *provisioned_type;
+	bool active;
 };
 
 /**
@@ -1444,6 +1445,9 @@ void devlink_linecard_destroy(struct devlink_linecard *linecard);
 void devlink_linecard_provision_set(struct devlink_linecard *linecard,
 				    u32 type_index);
 void devlink_linecard_provision_clear(struct devlink_linecard *linecard);
+void devlink_linecard_activate(struct devlink_linecard *linecard);
+void devlink_linecard_deactivate(struct devlink_linecard *linecard);
+bool devlink_linecard_is_active(struct devlink_linecard *linecard);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 4111ddcc000b..d961d31fe288 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -338,6 +338,7 @@ enum devlink_linecard_state {
 	DEVLINK_LINECARD_STATE_UNPROVISIONING,
 	DEVLINK_LINECARD_STATE_PROVISIONING,
 	DEVLINK_LINECARD_STATE_PROVISIONED,
+	DEVLINK_LINECARD_STATE_ACTIVE,
 
 	__DEVLINK_LINECARD_STATE_MAX,
 	DEVLINK_LINECARD_STATE_MAX = __DEVLINK_LINECARD_STATE_MAX - 1
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 434eecc310c3..9c76edf8c8af 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -8790,6 +8790,52 @@ void devlink_linecard_provision_clear(struct devlink_linecard *linecard)
 }
 EXPORT_SYMBOL_GPL(devlink_linecard_provision_clear);
 
+/**
+ *	devlink_linecard_activate - Set linecard active
+ *
+ *	@devlink_linecard: devlink linecard
+ */
+void devlink_linecard_activate(struct devlink_linecard *linecard)
+{
+	mutex_lock(&linecard->devlink->lock);
+	WARN_ON(linecard->state != DEVLINK_LINECARD_STATE_PROVISIONED);
+	linecard->state = DEVLINK_LINECARD_STATE_ACTIVE;
+	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
+	mutex_unlock(&linecard->devlink->lock);
+}
+EXPORT_SYMBOL_GPL(devlink_linecard_activate);
+
+/**
+ *	devlink_linecard_deactivate - Set linecard deactive
+ *
+ *	@devlink_linecard: devlink linecard
+ */
+void devlink_linecard_deactivate(struct devlink_linecard *linecard)
+{
+	mutex_lock(&linecard->devlink->lock);
+	WARN_ON(linecard->state != DEVLINK_LINECARD_STATE_ACTIVE);
+	linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED;
+	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
+	mutex_unlock(&linecard->devlink->lock);
+}
+EXPORT_SYMBOL_GPL(devlink_linecard_deactivate);
+
+/**
+ *	devlink_linecard_is_active - Check if active
+ *
+ *	@devlink_linecard: devlink linecard
+ */
+bool devlink_linecard_is_active(struct devlink_linecard *linecard)
+{
+	bool active;
+
+	mutex_lock(&linecard->devlink->lock);
+	active = linecard->state == DEVLINK_LINECARD_STATE_ACTIVE;
+	mutex_unlock(&linecard->devlink->lock);
+	return active;
+}
+EXPORT_SYMBOL_GPL(devlink_linecard_is_active);
+
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 04/10] devlink: append split port number to the port name
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (2 preceding siblings ...)
  2021-01-13 12:12 ` [patch net-next RFC 03/10] devlink: implement line card active state Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-13 12:12 ` [patch net-next RFC 05/10] devlink: add port to line card relationship set Jiri Pirko
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

Instead of doing sprintf twice in case the port is split or not, append
the split port suffix in case the port is split.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 net/core/devlink.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9c76edf8c8af..347976b88404 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -8654,12 +8654,10 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 	switch (attrs->flavour) {
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 	case DEVLINK_PORT_FLAVOUR_VIRTUAL:
-		if (!attrs->split)
-			n = snprintf(name, len, "p%u", attrs->phys.port_number);
-		else
-			n = snprintf(name, len, "p%us%u",
-				     attrs->phys.port_number,
-				     attrs->phys.split_subport_number);
+		n = snprintf(name, len, "p%u", attrs->phys.port_number);
+		if (attrs->split)
+			n += snprintf(name + n, len - n, "s%u",
+				      attrs->phys.split_subport_number);
 		break;
 	case DEVLINK_PORT_FLAVOUR_CPU:
 	case DEVLINK_PORT_FLAVOUR_DSA:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 05/10] devlink: add port to line card relationship set
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (3 preceding siblings ...)
  2021-01-13 12:12 ` [patch net-next RFC 04/10] devlink: append split port number to the port name Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-15 16:10   ` Ido Schimmel
  2021-01-13 12:12 ` [patch net-next RFC 06/10] netdevsim: introduce line card support Jiri Pirko
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

In order to properly inform user about relationship between port and
line card, introduce a driver API to set line card for a port. Use this
information to extend port devlink netlink message by line card index
and also include the line card index into phys_port_name and by that
into a netdevice name.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 include/net/devlink.h |  3 +++
 net/core/devlink.c    | 25 ++++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index ec00cd94c626..cb911b6fdeda 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -137,6 +137,7 @@ struct devlink_port {
 	struct delayed_work type_warn_dw;
 	struct list_head reporter_list;
 	struct mutex reporters_lock; /* Protects reporter_list */
+	struct devlink_linecard *linecard;
 };
 
 struct devlink_linecard_ops;
@@ -1438,6 +1439,8 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
 				   u16 pf, bool external);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external);
+void devlink_port_linecard_set(struct devlink_port *devlink_port,
+			       struct devlink_linecard *linecard);
 struct devlink_linecard *
 devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
 			const struct devlink_linecard_ops *ops, void *priv);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 347976b88404..2faa30cc5cce 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -855,6 +855,10 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 		goto nla_put_failure;
 	if (devlink_nl_port_function_attrs_put(msg, devlink_port, extack))
 		goto nla_put_failure;
+	if (devlink_port->linecard &&
+	    nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX,
+			devlink_port->linecard->index))
+		goto nla_put_failure;
 
 	genlmsg_end(msg, hdr);
 	return 0;
@@ -8642,6 +8646,21 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
 
+/**
+ *	devlink_port_linecard_set - Link port with a linecard
+ *
+ *	@devlink_port: devlink port
+ *	@devlink_linecard: devlink linecard
+ */
+void devlink_port_linecard_set(struct devlink_port *devlink_port,
+			       struct devlink_linecard *linecard)
+{
+	if (WARN_ON(devlink_port->registered))
+		return;
+	devlink_port->linecard = linecard;
+}
+EXPORT_SYMBOL_GPL(devlink_port_linecard_set);
+
 static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 					     char *name, size_t len)
 {
@@ -8654,7 +8673,11 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 	switch (attrs->flavour) {
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 	case DEVLINK_PORT_FLAVOUR_VIRTUAL:
-		n = snprintf(name, len, "p%u", attrs->phys.port_number);
+		if (devlink_port->linecard)
+			n = snprintf(name, len, "l%u",
+				     devlink_port->linecard->index);
+		n += snprintf(name + n, len - n, "p%u",
+			      attrs->phys.port_number);
 		if (attrs->split)
 			n += snprintf(name + n, len - n, "s%u",
 				      attrs->phys.split_subport_number);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 06/10] netdevsim: introduce line card support
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (4 preceding siblings ...)
  2021-01-13 12:12 ` [patch net-next RFC 05/10] devlink: add port to line card relationship set Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-13 12:12 ` [patch net-next RFC 07/10] netdevsim: allow port objects to be linked with line cards Jiri Pirko
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

Add support for line card objects. Expose them over debugfs and allow
user to specify number of line cards to be created for a new device.
Similar to ports, the number of line cards is fixed.

Extend "new_device" sysfs file write by third number to allow to specify
number line cards like this:
$ echo "10 4 2" >/sys/bus/netdevsim/new_device

This command asks to create two line cards. By default, if this number
is not preset, no line card is created.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 drivers/net/netdevsim/bus.c       |  17 +++--
 drivers/net/netdevsim/dev.c       | 108 +++++++++++++++++++++++++++++-
 drivers/net/netdevsim/netdevsim.h |  15 +++++
 3 files changed, 133 insertions(+), 7 deletions(-)

diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
index 0e9511661601..ed57c012e660 100644
--- a/drivers/net/netdevsim/bus.c
+++ b/drivers/net/netdevsim/bus.c
@@ -179,29 +179,34 @@ static struct device_type nsim_bus_dev_type = {
 };
 
 static struct nsim_bus_dev *
-nsim_bus_dev_new(unsigned int id, unsigned int port_count);
+nsim_bus_dev_new(unsigned int id, unsigned int port_count,
+		 unsigned int linecard_count);
 
 static ssize_t
 new_device_store(struct bus_type *bus, const char *buf, size_t count)
 {
 	struct nsim_bus_dev *nsim_bus_dev;
+	unsigned int linecard_count;
 	unsigned int port_count;
 	unsigned int id;
 	int err;
 
-	err = sscanf(buf, "%u %u", &id, &port_count);
+	err = sscanf(buf, "%u %u %u", &id, &port_count, &linecard_count);
 	switch (err) {
 	case 1:
 		port_count = 1;
 		fallthrough;
 	case 2:
+		linecard_count = 0;
+		fallthrough;
+	case 3:
 		if (id > INT_MAX) {
 			pr_err("Value of \"id\" is too big.\n");
 			return -EINVAL;
 		}
 		break;
 	default:
-		pr_err("Format for adding new device is \"id port_count\" (uint uint).\n");
+		pr_err("Format for adding new device is \"id port_count linecard_count\" (uint uint uint).\n");
 		return -EINVAL;
 	}
 
@@ -212,7 +217,7 @@ new_device_store(struct bus_type *bus, const char *buf, size_t count)
 		goto err;
 	}
 
-	nsim_bus_dev = nsim_bus_dev_new(id, port_count);
+	nsim_bus_dev = nsim_bus_dev_new(id, port_count, linecard_count);
 	if (IS_ERR(nsim_bus_dev)) {
 		err = PTR_ERR(nsim_bus_dev);
 		goto err;
@@ -312,7 +317,8 @@ static struct bus_type nsim_bus = {
 };
 
 static struct nsim_bus_dev *
-nsim_bus_dev_new(unsigned int id, unsigned int port_count)
+nsim_bus_dev_new(unsigned int id, unsigned int port_count,
+		 unsigned int linecard_count)
 {
 	struct nsim_bus_dev *nsim_bus_dev;
 	int err;
@@ -328,6 +334,7 @@ nsim_bus_dev_new(unsigned int id, unsigned int port_count)
 	nsim_bus_dev->dev.bus = &nsim_bus;
 	nsim_bus_dev->dev.type = &nsim_bus_dev_type;
 	nsim_bus_dev->port_count = port_count;
+	nsim_bus_dev->linecard_count = linecard_count;
 	nsim_bus_dev->initial_net = current->nsproxy->net_ns;
 	mutex_init(&nsim_bus_dev->nsim_bus_reload_lock);
 	/* Disallow using nsim_bus_dev */
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 816af1f55e2c..d81ccfa05a28 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -203,6 +203,10 @@ static int nsim_dev_debugfs_init(struct nsim_dev *nsim_dev)
 	nsim_dev->ports_ddir = debugfs_create_dir("ports", nsim_dev->ddir);
 	if (IS_ERR(nsim_dev->ports_ddir))
 		return PTR_ERR(nsim_dev->ports_ddir);
+	nsim_dev->linecards_ddir = debugfs_create_dir("linecards",
+						      nsim_dev->ddir);
+	if (IS_ERR(nsim_dev->linecards_ddir))
+		return PTR_ERR(nsim_dev->linecards_ddir);
 	debugfs_create_bool("fw_update_status", 0600, nsim_dev->ddir,
 			    &nsim_dev->fw_update_status);
 	debugfs_create_u32("fw_update_overwrite_mask", 0600, nsim_dev->ddir,
@@ -237,6 +241,7 @@ static int nsim_dev_debugfs_init(struct nsim_dev *nsim_dev)
 
 static void nsim_dev_debugfs_exit(struct nsim_dev *nsim_dev)
 {
+	debugfs_remove_recursive(nsim_dev->linecards_ddir);
 	debugfs_remove_recursive(nsim_dev->ports_ddir);
 	debugfs_remove_recursive(nsim_dev->ddir);
 }
@@ -265,6 +270,32 @@ static void nsim_dev_port_debugfs_exit(struct nsim_dev_port *nsim_dev_port)
 	debugfs_remove_recursive(nsim_dev_port->ddir);
 }
 
+static int
+nsim_dev_linecard_debugfs_init(struct nsim_dev *nsim_dev,
+			       struct nsim_dev_linecard *nsim_dev_linecard)
+{
+	char linecard_ddir_name[16];
+	char dev_link_name[32];
+
+	sprintf(linecard_ddir_name, "%u", nsim_dev_linecard->linecard_index);
+	nsim_dev_linecard->ddir = debugfs_create_dir(linecard_ddir_name,
+						     nsim_dev->linecards_ddir);
+	if (IS_ERR(nsim_dev_linecard->ddir))
+		return PTR_ERR(nsim_dev_linecard->ddir);
+
+	sprintf(dev_link_name, "../../../" DRV_NAME "%u",
+		nsim_dev->nsim_bus_dev->dev.id);
+	debugfs_create_symlink("dev", nsim_dev_linecard->ddir, dev_link_name);
+
+	return 0;
+}
+
+static void
+nsim_dev_linecard_debugfs_exit(struct nsim_dev_linecard *nsim_dev_linecard)
+{
+	debugfs_remove_recursive(nsim_dev_linecard->ddir);
+}
+
 static int nsim_dev_resources_register(struct devlink *devlink)
 {
 	struct devlink_resource_size_params params = {
@@ -998,6 +1029,64 @@ static int nsim_dev_port_add_all(struct nsim_dev *nsim_dev,
 	return err;
 }
 
+static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
+				   unsigned int linecard_index)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard;
+	int err;
+
+	nsim_dev_linecard = kzalloc(sizeof(*nsim_dev_linecard), GFP_KERNEL);
+	if (!nsim_dev_linecard)
+		return -ENOMEM;
+	nsim_dev_linecard->nsim_dev = nsim_dev;
+	nsim_dev_linecard->linecard_index = linecard_index;
+
+	err = nsim_dev_linecard_debugfs_init(nsim_dev, nsim_dev_linecard);
+	if (err)
+		goto err_linecard_free;
+
+	list_add(&nsim_dev_linecard->list, &nsim_dev->linecard_list);
+
+	return 0;
+
+err_linecard_free:
+	kfree(nsim_dev_linecard);
+	return err;
+}
+
+static void __nsim_dev_linecard_del(struct nsim_dev_linecard *nsim_dev_linecard)
+{
+	list_del(&nsim_dev_linecard->list);
+	nsim_dev_linecard_debugfs_exit(nsim_dev_linecard);
+	kfree(nsim_dev_linecard);
+}
+
+static void nsim_dev_linecard_del_all(struct nsim_dev *nsim_dev)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard, *tmp;
+
+	list_for_each_entry_safe(nsim_dev_linecard, tmp,
+				 &nsim_dev->linecard_list, list)
+		__nsim_dev_linecard_del(nsim_dev_linecard);
+}
+
+static int nsim_dev_linecard_add_all(struct nsim_dev *nsim_dev,
+				     unsigned int linecard_count)
+{
+	int i, err;
+
+	for (i = 0; i < linecard_count; i++) {
+		err = __nsim_dev_linecard_add(nsim_dev, i);
+		if (err)
+			goto err_linecard_del_all;
+	}
+	return 0;
+
+err_linecard_del_all:
+	nsim_dev_linecard_del_all(nsim_dev);
+	return err;
+}
+
 static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
 				  struct netlink_ext_ack *extack)
 {
@@ -1009,6 +1098,7 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
 	nsim_dev = devlink_priv(devlink);
 	INIT_LIST_HEAD(&nsim_dev->port_list);
 	mutex_init(&nsim_dev->port_list_lock);
+	INIT_LIST_HEAD(&nsim_dev->linecard_list);
 	nsim_dev->fw_update_status = true;
 	nsim_dev->fw_update_overwrite_mask = 0;
 
@@ -1030,10 +1120,14 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
 	if (err)
 		goto err_traps_exit;
 
-	err = nsim_dev_port_add_all(nsim_dev, nsim_bus_dev->port_count);
+	err = nsim_dev_linecard_add_all(nsim_dev, nsim_bus_dev->linecard_count);
 	if (err)
 		goto err_health_exit;
 
+	err = nsim_dev_port_add_all(nsim_dev, nsim_bus_dev->port_count);
+	if (err)
+		goto err_linecard_del_all;
+
 	nsim_dev->take_snapshot = debugfs_create_file("take_snapshot",
 						      0200,
 						      nsim_dev->ddir,
@@ -1041,6 +1135,8 @@ static int nsim_dev_reload_create(struct nsim_dev *nsim_dev,
 						&nsim_dev_take_snapshot_fops);
 	return 0;
 
+err_linecard_del_all:
+	nsim_dev_linecard_del_all(nsim_dev);
 err_health_exit:
 	nsim_dev_health_exit(nsim_dev);
 err_traps_exit:
@@ -1068,6 +1164,7 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
 	get_random_bytes(nsim_dev->switch_id.id, nsim_dev->switch_id.id_len);
 	INIT_LIST_HEAD(&nsim_dev->port_list);
 	mutex_init(&nsim_dev->port_list_lock);
+	INIT_LIST_HEAD(&nsim_dev->linecard_list);
 	nsim_dev->fw_update_status = true;
 	nsim_dev->fw_update_overwrite_mask = 0;
 	nsim_dev->max_macs = NSIM_DEV_MAX_MACS_DEFAULT;
@@ -1116,14 +1213,20 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
 	if (err)
 		goto err_health_exit;
 
-	err = nsim_dev_port_add_all(nsim_dev, nsim_bus_dev->port_count);
+	err = nsim_dev_linecard_add_all(nsim_dev, nsim_bus_dev->linecard_count);
 	if (err)
 		goto err_bpf_dev_exit;
 
+	err = nsim_dev_port_add_all(nsim_dev, nsim_bus_dev->port_count);
+	if (err)
+		goto err_linecard_del_all;
+
 	devlink_params_publish(devlink);
 	devlink_reload_enable(devlink);
 	return 0;
 
+err_linecard_del_all:
+	nsim_dev_linecard_del_all(nsim_dev);
 err_bpf_dev_exit:
 	nsim_bpf_dev_exit(nsim_dev);
 err_health_exit:
@@ -1156,6 +1259,7 @@ static void nsim_dev_reload_destroy(struct nsim_dev *nsim_dev)
 		return;
 	debugfs_remove(nsim_dev->take_snapshot);
 	nsim_dev_port_del_all(nsim_dev);
+	nsim_dev_linecard_del_all(nsim_dev);
 	nsim_dev_health_exit(nsim_dev);
 	nsim_dev_traps_exit(devlink);
 	nsim_dev_dummy_region_exit(nsim_dev);
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 48163c5f2ec9..df10f9d11e9d 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -180,20 +180,33 @@ struct nsim_dev_health {
 int nsim_dev_health_init(struct nsim_dev *nsim_dev, struct devlink *devlink);
 void nsim_dev_health_exit(struct nsim_dev *nsim_dev);
 
+struct nsim_dev_linecard;
+
 struct nsim_dev_port {
 	struct list_head list;
 	struct devlink_port devlink_port;
+	struct nsim_dev_linecard *linecard;
 	unsigned int port_index;
 	struct dentry *ddir;
 	struct netdevsim *ns;
 };
 
+struct nsim_dev;
+
+struct nsim_dev_linecard {
+	struct list_head list;
+	struct nsim_dev *nsim_dev;
+	unsigned int linecard_index;
+	struct dentry *ddir;
+};
+
 struct nsim_dev {
 	struct nsim_bus_dev *nsim_bus_dev;
 	struct nsim_fib_data *fib_data;
 	struct nsim_trap_data *trap_data;
 	struct dentry *ddir;
 	struct dentry *ports_ddir;
+	struct dentry *linecards_ddir;
 	struct dentry *take_snapshot;
 	struct bpf_offload_dev *bpf_dev;
 	bool bpf_bind_accept;
@@ -206,6 +219,7 @@ struct nsim_dev {
 	struct netdev_phys_item_id switch_id;
 	struct list_head port_list;
 	struct mutex port_list_lock; /* protects port list */
+	struct list_head linecard_list;
 	bool fw_update_status;
 	u32 fw_update_overwrite_mask;
 	u32 max_macs;
@@ -287,6 +301,7 @@ struct nsim_bus_dev {
 	struct device dev;
 	struct list_head list;
 	unsigned int port_count;
+	unsigned int linecard_count;
 	struct net *initial_net; /* Purpose of this is to carry net pointer
 				  * during the probe time only.
 				  */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 07/10] netdevsim: allow port objects to be linked with line cards
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (5 preceding siblings ...)
  2021-01-13 12:12 ` [patch net-next RFC 06/10] netdevsim: introduce line card support Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-13 12:12 ` [patch net-next RFC 08/10] netdevsim: create devlink line card object and implement provisioning Jiri Pirko
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

Line cards contain ports. Allow ports to be places on the line cards.
Track the ports that belong under certain line card. Make sure that
the line card port carrier is down, as it will be taken up later on
during "activation".

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 drivers/net/netdevsim/bus.c       |  4 +--
 drivers/net/netdevsim/dev.c       | 48 +++++++++++++++++++++++++------
 drivers/net/netdevsim/netdev.c    |  2 ++
 drivers/net/netdevsim/netdevsim.h |  4 +++
 4 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/drivers/net/netdevsim/bus.c b/drivers/net/netdevsim/bus.c
index ed57c012e660..a0afd30d76e6 100644
--- a/drivers/net/netdevsim/bus.c
+++ b/drivers/net/netdevsim/bus.c
@@ -113,7 +113,7 @@ new_port_store(struct device *dev, struct device_attribute *attr,
 
 	mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
 	devlink_reload_disable(devlink);
-	ret = nsim_dev_port_add(nsim_bus_dev, port_index);
+	ret = nsim_dev_port_add(nsim_bus_dev, NULL, port_index);
 	devlink_reload_enable(devlink);
 	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 	return ret ? ret : count;
@@ -142,7 +142,7 @@ del_port_store(struct device *dev, struct device_attribute *attr,
 
 	mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
 	devlink_reload_disable(devlink);
-	ret = nsim_dev_port_del(nsim_bus_dev, port_index);
+	ret = nsim_dev_port_del(nsim_bus_dev, NULL, port_index);
 	devlink_reload_enable(devlink);
 	mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
 	return ret ? ret : count;
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index d81ccfa05a28..e706317fc0f9 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -35,6 +35,21 @@
 
 #include "netdevsim.h"
 
+#define NSIM_DEV_LINECARD_PORT_INDEX_BASE 1000
+#define NSIM_DEV_LINECARD_PORT_INDEX_STEP 100
+
+static unsigned int
+nsim_dev_port_index(struct nsim_dev_linecard *nsim_dev_linecard,
+		    unsigned int port_index)
+{
+	if (!nsim_dev_linecard)
+		return port_index;
+
+	return NSIM_DEV_LINECARD_PORT_INDEX_BASE +
+	       nsim_dev_linecard->linecard_index * NSIM_DEV_LINECARD_PORT_INDEX_STEP +
+	       port_index;
+}
+
 static struct dentry *nsim_dev_ddir;
 
 #define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
@@ -942,6 +957,7 @@ static const struct devlink_ops nsim_dev_devlink_ops = {
 #define NSIM_DEV_TEST1_DEFAULT true
 
 static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
+			       struct nsim_dev_linecard *nsim_dev_linecard,
 			       unsigned int port_index)
 {
 	struct devlink_port_attrs attrs = {};
@@ -952,8 +968,9 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
 	nsim_dev_port = kzalloc(sizeof(*nsim_dev_port), GFP_KERNEL);
 	if (!nsim_dev_port)
 		return -ENOMEM;
-	nsim_dev_port->port_index = port_index;
-
+	nsim_dev_port->port_index = nsim_dev_port_index(nsim_dev_linecard,
+							port_index);
+	nsim_dev_port->linecard = nsim_dev_linecard;
 	devlink_port = &nsim_dev_port->devlink_port;
 	attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
 	attrs.phys.port_number = port_index + 1;
@@ -961,7 +978,7 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
 	attrs.switch_id.id_len = nsim_dev->switch_id.id_len;
 	devlink_port_attrs_set(devlink_port, &attrs);
 	err = devlink_port_register(priv_to_devlink(nsim_dev), devlink_port,
-				    port_index);
+				    nsim_dev_port->port_index);
 	if (err)
 		goto err_port_free;
 
@@ -975,6 +992,11 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
 		goto err_port_debugfs_exit;
 	}
 
+	if (nsim_dev_linecard)
+		list_add(&nsim_dev_port->list_lc, &nsim_dev_linecard->port_list);
+	else
+		netif_carrier_on(nsim_dev_port->ns->netdev);
+
 	devlink_port_type_eth_set(devlink_port, nsim_dev_port->ns->netdev);
 	list_add(&nsim_dev_port->list, &nsim_dev->port_list);
 
@@ -994,6 +1016,8 @@ static void __nsim_dev_port_del(struct nsim_dev_port *nsim_dev_port)
 	struct devlink_port *devlink_port = &nsim_dev_port->devlink_port;
 
 	list_del(&nsim_dev_port->list);
+	if (nsim_dev_port->linecard)
+		list_del(&nsim_dev_port->list_lc);
 	devlink_port_type_clear(devlink_port);
 	nsim_destroy(nsim_dev_port->ns);
 	nsim_dev_port_debugfs_exit(nsim_dev_port);
@@ -1018,7 +1042,7 @@ static int nsim_dev_port_add_all(struct nsim_dev *nsim_dev,
 	int i, err;
 
 	for (i = 0; i < port_count; i++) {
-		err = __nsim_dev_port_add(nsim_dev, i);
+		err = __nsim_dev_port_add(nsim_dev, NULL, i);
 		if (err)
 			goto err_port_del_all;
 	}
@@ -1040,6 +1064,7 @@ static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
 		return -ENOMEM;
 	nsim_dev_linecard->nsim_dev = nsim_dev;
 	nsim_dev_linecard->linecard_index = linecard_index;
+	INIT_LIST_HEAD(&nsim_dev_linecard->port_list);
 
 	err = nsim_dev_linecard_debugfs_init(nsim_dev, nsim_dev_linecard);
 	if (err)
@@ -1286,10 +1311,13 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
 }
 
 static struct nsim_dev_port *
-__nsim_dev_port_lookup(struct nsim_dev *nsim_dev, unsigned int port_index)
+__nsim_dev_port_lookup(struct nsim_dev *nsim_dev,
+		       struct nsim_dev_linecard *nsim_dev_linecard,
+		       unsigned int port_index)
 {
 	struct nsim_dev_port *nsim_dev_port;
 
+	port_index = nsim_dev_port_index(nsim_dev_linecard, port_index);
 	list_for_each_entry(nsim_dev_port, &nsim_dev->port_list, list)
 		if (nsim_dev_port->port_index == port_index)
 			return nsim_dev_port;
@@ -1297,21 +1325,24 @@ __nsim_dev_port_lookup(struct nsim_dev *nsim_dev, unsigned int port_index)
 }
 
 int nsim_dev_port_add(struct nsim_bus_dev *nsim_bus_dev,
+		      struct nsim_dev_linecard *nsim_dev_linecard,
 		      unsigned int port_index)
 {
 	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
 	int err;
 
 	mutex_lock(&nsim_dev->port_list_lock);
-	if (__nsim_dev_port_lookup(nsim_dev, port_index))
+	if (__nsim_dev_port_lookup(nsim_dev, nsim_dev_linecard, port_index))
 		err = -EEXIST;
 	else
-		err = __nsim_dev_port_add(nsim_dev, port_index);
+		err = __nsim_dev_port_add(nsim_dev, nsim_dev_linecard,
+					  port_index);
 	mutex_unlock(&nsim_dev->port_list_lock);
 	return err;
 }
 
 int nsim_dev_port_del(struct nsim_bus_dev *nsim_bus_dev,
+		      struct nsim_dev_linecard *nsim_dev_linecard,
 		      unsigned int port_index)
 {
 	struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
@@ -1319,7 +1350,8 @@ int nsim_dev_port_del(struct nsim_bus_dev *nsim_bus_dev,
 	int err = 0;
 
 	mutex_lock(&nsim_dev->port_list_lock);
-	nsim_dev_port = __nsim_dev_port_lookup(nsim_dev, port_index);
+	nsim_dev_port = __nsim_dev_port_lookup(nsim_dev, nsim_dev_linecard,
+					       port_index);
 	if (!nsim_dev_port)
 		err = -ENOENT;
 	else
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index aec92440eef1..1e0dc298bf20 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -312,6 +312,8 @@ nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port)
 
 	nsim_ipsec_init(ns);
 
+	netif_carrier_off(dev);
+
 	err = register_netdevice(dev);
 	if (err)
 		goto err_ipsec_teardown;
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index df10f9d11e9d..88b61b9390bf 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -184,6 +184,7 @@ struct nsim_dev_linecard;
 
 struct nsim_dev_port {
 	struct list_head list;
+	struct list_head list_lc; /* node in linecard list */
 	struct devlink_port devlink_port;
 	struct nsim_dev_linecard *linecard;
 	unsigned int port_index;
@@ -196,6 +197,7 @@ struct nsim_dev;
 struct nsim_dev_linecard {
 	struct list_head list;
 	struct nsim_dev *nsim_dev;
+	struct list_head port_list;
 	unsigned int linecard_index;
 	struct dentry *ddir;
 };
@@ -255,8 +257,10 @@ void nsim_dev_exit(void);
 int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev);
 void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev);
 int nsim_dev_port_add(struct nsim_bus_dev *nsim_bus_dev,
+		      struct nsim_dev_linecard *nsim_dev_linecard,
 		      unsigned int port_index);
 int nsim_dev_port_del(struct nsim_bus_dev *nsim_bus_dev,
+		      struct nsim_dev_linecard *nsim_dev_linecard,
 		      unsigned int port_index);
 
 struct nsim_fib_data *nsim_fib_create(struct devlink *devlink,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 08/10] netdevsim: create devlink line card object and implement provisioning
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (6 preceding siblings ...)
  2021-01-13 12:12 ` [patch net-next RFC 07/10] netdevsim: allow port objects to be linked with line cards Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-15 16:30   ` Ido Schimmel
  2021-01-13 12:12 ` [patch net-next RFC 09/10] netdevsim: implement line card activation Jiri Pirko
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

Use devlink_linecard_create/destroy() to register the line card with
devlink core. Implement provisioning ops with a list of supported
line cards. To avoid deadlock and to mimic actual HW flow, use workqueue
to add/del ports during provisioning as the port add/del calls
devlink_port_register/unregister() which take devlink mutex.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 drivers/net/netdevsim/dev.c       | 135 +++++++++++++++++++++++++++++-
 drivers/net/netdevsim/netdevsim.h |   4 +
 2 files changed, 138 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index e706317fc0f9..9e9a2a75ddf8 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -35,6 +35,20 @@
 
 #include "netdevsim.h"
 
+static const char * const nsim_dev_linecard_supported_types[] = {
+	"card1port", "card2ports", "card4ports",
+};
+
+static const unsigned int nsim_dev_linecard_port_counts[] = {
+	1, 2, 4,
+};
+
+static unsigned int
+nsim_dev_linecard_port_count(struct nsim_dev_linecard *nsim_dev_linecard)
+{
+	return nsim_dev_linecard_port_counts[nsim_dev_linecard->type_index];
+}
+
 #define NSIM_DEV_LINECARD_PORT_INDEX_BASE 1000
 #define NSIM_DEV_LINECARD_PORT_INDEX_STEP 100
 
@@ -285,6 +299,25 @@ static void nsim_dev_port_debugfs_exit(struct nsim_dev_port *nsim_dev_port)
 	debugfs_remove_recursive(nsim_dev_port->ddir);
 }
 
+static ssize_t nsim_dev_linecard_type_read(struct file *file, char __user *data,
+					   size_t count, loff_t *ppos)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard = file->private_data;
+	const char *type;
+
+	if (!nsim_dev_linecard->provisioned)
+		return -EOPNOTSUPP;
+
+	type = nsim_dev_linecard_supported_types[nsim_dev_linecard->type_index];
+	return simple_read_from_buffer(data, count, ppos, type, strlen(type));
+}
+
+static const struct file_operations nsim_dev_linecard_type_fops = {
+	.open = simple_open,
+	.read = nsim_dev_linecard_type_read,
+	.owner = THIS_MODULE,
+};
+
 static int
 nsim_dev_linecard_debugfs_init(struct nsim_dev *nsim_dev,
 			       struct nsim_dev_linecard *nsim_dev_linecard)
@@ -301,6 +334,8 @@ nsim_dev_linecard_debugfs_init(struct nsim_dev *nsim_dev,
 	sprintf(dev_link_name, "../../../" DRV_NAME "%u",
 		nsim_dev->nsim_bus_dev->dev.id);
 	debugfs_create_symlink("dev", nsim_dev_linecard->ddir, dev_link_name);
+	debugfs_create_file("type", 0400, nsim_dev_linecard->ddir,
+			    nsim_dev_linecard, &nsim_dev_linecard_type_fops);
 
 	return 0;
 }
@@ -977,6 +1012,9 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
 	memcpy(attrs.switch_id.id, nsim_dev->switch_id.id, nsim_dev->switch_id.id_len);
 	attrs.switch_id.id_len = nsim_dev->switch_id.id_len;
 	devlink_port_attrs_set(devlink_port, &attrs);
+	if (nsim_dev_linecard)
+		devlink_port_linecard_set(devlink_port,
+					  nsim_dev_linecard->devlink_linecard);
 	err = devlink_port_register(priv_to_devlink(nsim_dev), devlink_port,
 				    nsim_dev_port->port_index);
 	if (err)
@@ -1053,10 +1091,88 @@ static int nsim_dev_port_add_all(struct nsim_dev *nsim_dev,
 	return err;
 }
 
+static void nsim_dev_linecard_provision_work(struct work_struct *work)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard;
+	struct nsim_bus_dev *nsim_bus_dev;
+	int err;
+	int i;
+
+	nsim_dev_linecard = container_of(work, struct nsim_dev_linecard,
+					 provision_work);
+
+	nsim_bus_dev = nsim_dev_linecard->nsim_dev->nsim_bus_dev;
+	for (i = 0; i < nsim_dev_linecard_port_count(nsim_dev_linecard); i++) {
+		err = nsim_dev_port_add(nsim_bus_dev, nsim_dev_linecard, i);
+		if (err)
+			goto err_port_del_all;
+	}
+	nsim_dev_linecard->provisioned = true;
+	devlink_linecard_provision_set(nsim_dev_linecard->devlink_linecard,
+				       nsim_dev_linecard->type_index);
+	return;
+
+err_port_del_all:
+	for (i--; i >= 0; i--)
+		nsim_dev_port_del(nsim_bus_dev, nsim_dev_linecard, i);
+	devlink_linecard_provision_clear(nsim_dev_linecard->devlink_linecard);
+}
+
+static int nsim_dev_linecard_provision(struct devlink_linecard *linecard,
+				       void *priv, u32 type_index,
+				       struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard = priv;
+
+	nsim_dev_linecard->type_index = type_index;
+	INIT_WORK(&nsim_dev_linecard->provision_work,
+		  nsim_dev_linecard_provision_work);
+	schedule_work(&nsim_dev_linecard->provision_work);
+
+	return 0;
+}
+
+static void nsim_dev_linecard_unprovision_work(struct work_struct *work)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard;
+	struct nsim_bus_dev *nsim_bus_dev;
+	int i;
+
+	nsim_dev_linecard = container_of(work, struct nsim_dev_linecard,
+					 provision_work);
+
+	nsim_bus_dev = nsim_dev_linecard->nsim_dev->nsim_bus_dev;
+	nsim_dev_linecard->provisioned = false;
+	devlink_linecard_provision_clear(nsim_dev_linecard->devlink_linecard);
+	for (i = 0; i < nsim_dev_linecard_port_count(nsim_dev_linecard); i++)
+		nsim_dev_port_del(nsim_bus_dev, nsim_dev_linecard, i);
+}
+
+static int nsim_dev_linecard_unprovision(struct devlink_linecard *linecard,
+					 void *priv,
+					 struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard = priv;
+
+	INIT_WORK(&nsim_dev_linecard->provision_work,
+		  nsim_dev_linecard_unprovision_work);
+	schedule_work(&nsim_dev_linecard->provision_work);
+
+	return 0;
+}
+
+static const struct devlink_linecard_ops nsim_dev_linecard_ops = {
+	.supported_types = nsim_dev_linecard_supported_types,
+	.supported_types_count = ARRAY_SIZE(nsim_dev_linecard_supported_types),
+	.provision = nsim_dev_linecard_provision,
+	.unprovision = nsim_dev_linecard_unprovision,
+};
+
 static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
 				   unsigned int linecard_index)
 {
 	struct nsim_dev_linecard *nsim_dev_linecard;
+	struct devlink_linecard *devlink_linecard;
 	int err;
 
 	nsim_dev_linecard = kzalloc(sizeof(*nsim_dev_linecard), GFP_KERNEL);
@@ -1066,14 +1182,27 @@ static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
 	nsim_dev_linecard->linecard_index = linecard_index;
 	INIT_LIST_HEAD(&nsim_dev_linecard->port_list);
 
+	devlink_linecard = devlink_linecard_create(priv_to_devlink(nsim_dev),
+						   linecard_index,
+						   &nsim_dev_linecard_ops,
+						   nsim_dev_linecard);
+	if (IS_ERR(devlink_linecard)) {
+		err = PTR_ERR(devlink_linecard);
+		goto err_linecard_free;
+	}
+
+	nsim_dev_linecard->devlink_linecard = devlink_linecard;
+
 	err = nsim_dev_linecard_debugfs_init(nsim_dev, nsim_dev_linecard);
 	if (err)
-		goto err_linecard_free;
+		goto err_dl_linecard_destroy;
 
 	list_add(&nsim_dev_linecard->list, &nsim_dev->linecard_list);
 
 	return 0;
 
+err_dl_linecard_destroy:
+	devlink_linecard_destroy(devlink_linecard);
 err_linecard_free:
 	kfree(nsim_dev_linecard);
 	return err;
@@ -1081,8 +1210,12 @@ static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
 
 static void __nsim_dev_linecard_del(struct nsim_dev_linecard *nsim_dev_linecard)
 {
+	struct devlink_linecard *devlink_linecard =
+					nsim_dev_linecard->devlink_linecard;
+
 	list_del(&nsim_dev_linecard->list);
 	nsim_dev_linecard_debugfs_exit(nsim_dev_linecard);
+	devlink_linecard_destroy(devlink_linecard);
 	kfree(nsim_dev_linecard);
 }
 
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 88b61b9390bf..ab217b361416 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -196,10 +196,14 @@ struct nsim_dev;
 
 struct nsim_dev_linecard {
 	struct list_head list;
+	struct devlink_linecard *devlink_linecard;
 	struct nsim_dev *nsim_dev;
 	struct list_head port_list;
 	unsigned int linecard_index;
 	struct dentry *ddir;
+	bool provisioned;
+	u32 type_index;
+	struct work_struct provision_work;
 };
 
 struct nsim_dev {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 09/10] netdevsim: implement line card activation
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (7 preceding siblings ...)
  2021-01-13 12:12 ` [patch net-next RFC 08/10] netdevsim: create devlink line card object and implement provisioning Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-13 12:12 ` [patch net-next RFC 10/10] selftests: add netdevsim devlink lc test Jiri Pirko
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

On real HW, the activation typically happens upon line card insertion.
Emulate such event using write to debugfs file "active".

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 drivers/net/netdevsim/dev.c | 81 +++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 9e9a2a75ddf8..81d68269e121 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -64,6 +64,30 @@ nsim_dev_port_index(struct nsim_dev_linecard *nsim_dev_linecard,
 	       port_index;
 }
 
+static int
+nsim_dev_linecard_activate(struct nsim_dev_linecard *nsim_dev_linecard)
+{
+	struct nsim_dev_port *nsim_dev_port;
+
+	list_for_each_entry(nsim_dev_port, &nsim_dev_linecard->port_list,
+			    list_lc)
+		netif_carrier_on(nsim_dev_port->ns->netdev);
+
+	devlink_linecard_activate(nsim_dev_linecard->devlink_linecard);
+	return 0;
+}
+
+static void
+nsim_dev_linecard_deactivate(struct nsim_dev_linecard *nsim_dev_linecard)
+{
+	struct nsim_dev_port *nsim_dev_port;
+
+	list_for_each_entry(nsim_dev_port, &nsim_dev_linecard->port_list,
+			    list_lc)
+		netif_carrier_off(nsim_dev_port->ns->netdev);
+	devlink_linecard_deactivate(nsim_dev_linecard->devlink_linecard);
+}
+
 static struct dentry *nsim_dev_ddir;
 
 #define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
@@ -299,6 +323,61 @@ static void nsim_dev_port_debugfs_exit(struct nsim_dev_port *nsim_dev_port)
 	debugfs_remove_recursive(nsim_dev_port->ddir);
 }
 
+static ssize_t nsim_dev_linecard_active_read(struct file *file,
+					     char __user *data,
+					     size_t count, loff_t *ppos)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard = file->private_data;
+	char buf[3];
+
+	if (!nsim_dev_linecard->provisioned)
+		return -EOPNOTSUPP;
+
+	if (devlink_linecard_is_active(nsim_dev_linecard->devlink_linecard))
+		buf[0] = 'Y';
+	else
+		buf[0] = 'N';
+	buf[1] = '\n';
+	buf[2] = 0x00;
+	return simple_read_from_buffer(data, count, ppos, buf, strlen(buf));
+}
+
+static ssize_t nsim_dev_linecard_active_write(struct file *file,
+					      const char __user *data,
+					      size_t count, loff_t *ppos)
+{
+	struct nsim_dev_linecard *nsim_dev_linecard = file->private_data;
+	bool active;
+	bool bv;
+	int err;
+	int r;
+
+	if (!nsim_dev_linecard->provisioned)
+		return -EOPNOTSUPP;
+
+	active = devlink_linecard_is_active(nsim_dev_linecard->devlink_linecard);
+
+	r = kstrtobool_from_user(data, count, &bv);
+	if (!r && active != bv) {
+		if (bv) {
+			err = nsim_dev_linecard_activate(nsim_dev_linecard);
+			if (err)
+				return err;
+		} else {
+			nsim_dev_linecard_deactivate(nsim_dev_linecard);
+		}
+	}
+	return count;
+}
+
+static const struct file_operations nsim_dev_linecard_active_fops = {
+	.open = simple_open,
+	.read = nsim_dev_linecard_active_read,
+	.write = nsim_dev_linecard_active_write,
+	.llseek = generic_file_llseek,
+	.owner = THIS_MODULE,
+};
+
 static ssize_t nsim_dev_linecard_type_read(struct file *file, char __user *data,
 					   size_t count, loff_t *ppos)
 {
@@ -334,6 +413,8 @@ nsim_dev_linecard_debugfs_init(struct nsim_dev *nsim_dev,
 	sprintf(dev_link_name, "../../../" DRV_NAME "%u",
 		nsim_dev->nsim_bus_dev->dev.id);
 	debugfs_create_symlink("dev", nsim_dev_linecard->ddir, dev_link_name);
+	debugfs_create_file("active", 0600, nsim_dev_linecard->ddir,
+			    nsim_dev_linecard, &nsim_dev_linecard_active_fops);
 	debugfs_create_file("type", 0400, nsim_dev_linecard->ddir,
 			    nsim_dev_linecard, &nsim_dev_linecard_type_fops);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch net-next RFC 10/10] selftests: add netdevsim devlink lc test
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (8 preceding siblings ...)
  2021-01-13 12:12 ` [patch net-next RFC 09/10] netdevsim: implement line card activation Jiri Pirko
@ 2021-01-13 12:12 ` Jiri Pirko
  2021-01-13 13:39 ` [patch iproute2/net-next RFC] devlink: add support for linecard show and provision Jiri Pirko
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 12:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

Add test to verify netdevsim driver line card functionality.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 .../drivers/net/netdevsim/devlink.sh          | 62 ++++++++++++++++++-
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index 40909c254365..c33cf13d3bf3 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -5,12 +5,13 @@ lib_dir=$(dirname $0)/../../../net/forwarding
 
 ALL_TESTS="fw_flash_test params_test regions_test reload_test \
 	   netns_reload_test resource_test dev_info_test \
-	   empty_reporter_test dummy_reporter_test"
+	   empty_reporter_test dummy_reporter_test linecard_test"
 NUM_NETIFS=0
 source $lib_dir/lib.sh
 
 BUS_ADDR=10
 PORT_COUNT=4
+LINECARD_COUNT=2
 DEV_NAME=netdevsim$BUS_ADDR
 SYSFS_NET_DIR=/sys/bus/netdevsim/devices/$DEV_NAME/net/
 DEBUGFS_DIR=/sys/kernel/debug/netdevsim/$DEV_NAME/
@@ -507,10 +508,67 @@ dummy_reporter_test()
 	log_test "dummy reporter test"
 }
 
+check_linecards_state()
+{
+	local expected_state_0=$1
+	local expected_state_1=$2
+
+	local state=$(devlink lc show $DL_HANDLE lc 0 -j | jq -e -r ".[][][].state")
+	check_err $? "Failed to get linecard 0 state"
+
+	[ "$state" == "$expected_state_0" ]
+	check_err $? "Unexpected linecard 0 state (got $state, expected $expected_state_0)"
+
+	local state=$(devlink lc show $DL_HANDLE lc 1 -j | jq -e -r ".[][][].state")
+	check_err $? "Failed to get linecard 1 state"
+
+	[ "$state" == "$expected_state_1" ]
+	check_err $? "Unexpected linecard 1 state (got $state, expected $expected_state_1)"
+}
+
+linecard_test()
+{
+	RET=0
+
+	check_linecards_state "unprovisioned" "unprovisioned"
+
+	devlink lc provision $DL_HANDLE lc 0 type card2ports
+	check_err $? "Failed to provision linecard 0 with card2ports"
+
+	check_linecards_state "provisioned" "unprovisioned"
+
+	devlink lc provision $DL_HANDLE lc 1 type card4ports
+	check_err $? "Failed to provision linecard 0 with card4ports"
+
+	check_linecards_state "provisioned" "provisioned"
+
+	echo "Y"> $DEBUGFS_DIR/linecards/0/active
+	check_err $? "Failed to set lincard 0 active"
+
+	check_linecards_state "active" "provisioned"
+
+	echo "Y"> $DEBUGFS_DIR/linecards/1/active
+	check_err $? "Failed to set lincard 1 active"
+
+	check_linecards_state "active" "active"
+
+	devlink lc unprovision $DL_HANDLE lc 0
+	check_err $? "Failed to unprovision linecard 0"
+
+	check_linecards_state "unprovisioned" "active"
+
+	devlink lc unprovision $DL_HANDLE lc 1
+	check_err $? "Failed to unprovision linecard 1"
+
+	check_linecards_state "unprovisioned" "unprovisioned"
+
+	log_test "linecard test"
+}
+
 setup_prepare()
 {
 	modprobe netdevsim
-	echo "$BUS_ADDR $PORT_COUNT" > /sys/bus/netdevsim/new_device
+	echo "$BUS_ADDR $PORT_COUNT $LINECARD_COUNT" > /sys/bus/netdevsim/new_device
 	while [ ! -d $SYSFS_NET_DIR ] ; do :; done
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [patch iproute2/net-next RFC] devlink: add support for linecard show and provision
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (9 preceding siblings ...)
  2021-01-13 12:12 ` [patch net-next RFC 10/10] selftests: add netdevsim devlink lc test Jiri Pirko
@ 2021-01-13 13:39 ` Jiri Pirko
  2021-01-14  2:07 ` [patch net-next RFC 00/10] introduce line card support for modular switch Andrew Lunn
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-13 13:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, jacob.e.keller, roopa, mlxsw

From: Jiri Pirko <jiri@nvidia.com>

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 devlink/devlink.c | 218 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 214 insertions(+), 4 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index a2e066441e8a..960f1078591e 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -306,6 +306,8 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_FLASH_OVERWRITE		BIT(39)
 #define DL_OPT_RELOAD_ACTION		BIT(40)
 #define DL_OPT_RELOAD_LIMIT	BIT(41)
+#define DL_OPT_LINECARD		BIT(42)
+#define DL_OPT_LINECARD_TYPE	BIT(43)
 
 struct dl_opts {
 	uint64_t present; /* flags of present items */
@@ -356,6 +358,8 @@ struct dl_opts {
 	uint32_t overwrite_mask;
 	enum devlink_reload_action reload_action;
 	enum devlink_reload_limit reload_limit;
+	uint32_t linecard_index;
+	const char *linecard_type;
 };
 
 struct dl {
@@ -1414,6 +1418,8 @@ static const struct dl_args_metadata dl_args_required[] = {
 	{DL_OPT_TRAP_NAME,            "Trap's name is expected."},
 	{DL_OPT_TRAP_GROUP_NAME,      "Trap group's name is expected."},
 	{DL_OPT_PORT_FUNCTION_HW_ADDR, "Port function's hardware address is expected."},
+	{DL_OPT_LINECARD,	      "Linecard index expected."},
+	{DL_OPT_LINECARD_TYPE,	      "Linecard type expected."},
 };
 
 static int dl_args_finding_required_validate(uint64_t o_required,
@@ -1832,7 +1838,20 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_PORT_FUNCTION_HW_ADDR;
-
+		} else if (dl_argv_match(dl, "lc") &&
+			   (o_all & DL_OPT_LINECARD)) {
+			dl_arg_inc(dl);
+			err = dl_argv_uint32_t(dl, &opts->linecard_index);
+			if (err)
+				return err;
+			o_found |= DL_OPT_LINECARD;
+		} else if (dl_argv_match(dl, "type") &&
+			   (o_all & DL_OPT_LINECARD_TYPE)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->linecard_type);
+			if (err)
+				return err;
+			o_found |= DL_OPT_LINECARD_TYPE;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -2015,6 +2034,12 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 				 opts->trap_policer_burst);
 	if (opts->present & DL_OPT_PORT_FUNCTION_HW_ADDR)
 		dl_function_attr_put(nlh, opts);
+	if (opts->present & DL_OPT_LINECARD)
+		mnl_attr_put_u32(nlh, DEVLINK_ATTR_LINECARD_INDEX,
+				 opts->linecard_index);
+	if (opts->present & DL_OPT_LINECARD_TYPE)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_LINECARD_TYPE,
+				  opts->linecard_type);
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -2036,6 +2061,7 @@ static bool dl_dump_filter(struct dl *dl, struct nlattr **tb)
 	struct nlattr *attr_dev_name = tb[DEVLINK_ATTR_DEV_NAME];
 	struct nlattr *attr_port_index = tb[DEVLINK_ATTR_PORT_INDEX];
 	struct nlattr *attr_sb_index = tb[DEVLINK_ATTR_SB_INDEX];
+	struct nlattr *attr_linecard_index = tb[DEVLINK_ATTR_LINECARD_INDEX];
 
 	if (opts->present & DL_OPT_HANDLE &&
 	    attr_bus_name && attr_dev_name) {
@@ -2063,6 +2089,12 @@ static bool dl_dump_filter(struct dl *dl, struct nlattr **tb)
 		if (sb_index != opts->sb_index)
 			return false;
 	}
+	if (opts->present & DL_OPT_LINECARD && attr_linecard_index) {
+		uint32_t linecard_index = mnl_attr_get_u32(attr_linecard_index);
+
+		if (linecard_index != opts->linecard_index)
+			return false;
+	}
 	return true;
 }
 
@@ -3833,6 +3865,9 @@ static void pr_out_port(struct dl *dl, struct nlattr **tb)
 			break;
 		}
 	}
+	if (tb[DEVLINK_ATTR_LINECARD_INDEX])
+		print_uint(PRINT_ANY, "lc", " lc %u",
+			   mnl_attr_get_u32(tb[DEVLINK_ATTR_LINECARD_INDEX]));
 	if (tb[DEVLINK_ATTR_PORT_NUMBER]) {
 		uint32_t port_number;
 
@@ -4005,6 +4040,156 @@ static int cmd_port(struct dl *dl)
 	return -ENOENT;
 }
 
+static void cmd_linecard_help(void)
+{
+	pr_err("Usage: devlink lc show [ DEV [ lc LC_INDEX ] ]\n");
+	pr_err("       devlink lc provision DEV lc LC_INDEX type LC_TYPE\n");
+	pr_err("       devlink lc unprovision DEV lc LC_INDEX\n");
+}
+
+static const char *linecard_state_name(uint16_t flavour)
+{
+	switch (flavour) {
+	case DEVLINK_LINECARD_STATE_UNPROVISIONED:
+		return "unprovisioned";
+	case DEVLINK_LINECARD_STATE_UNPROVISIONING:
+		return "unprovisioning";
+	case DEVLINK_LINECARD_STATE_PROVISIONING:
+		return "provisioning";
+	case DEVLINK_LINECARD_STATE_PROVISIONED:
+		return "provisioned";
+	case DEVLINK_LINECARD_STATE_ACTIVE:
+		return "active";
+	default:
+		return "<unknown state>";
+	}
+}
+
+static void pr_out_linecard_supported_types(struct dl *dl, struct nlattr **tb)
+{
+	struct nlattr *nla_types = tb[DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES];
+	struct nlattr *nla_type;
+
+	if (!nla_types)
+		return;
+
+	pr_out_array_start(dl, "supported_types");
+	check_indent_newline(dl);
+	mnl_attr_for_each_nested(nla_type, nla_types) {
+		print_string(PRINT_ANY, NULL, " %s",
+			     mnl_attr_get_str(nla_type));
+	}
+	pr_out_array_end(dl);
+}
+
+static void pr_out_linecard(struct dl *dl, struct nlattr **tb)
+{
+	uint8_t state;
+
+	pr_out_handle_start_arr(dl, tb);
+	check_indent_newline(dl);
+	print_uint(PRINT_ANY, "lc", "lc %u",
+		   mnl_attr_get_u32(tb[DEVLINK_ATTR_LINECARD_INDEX]));
+	state = mnl_attr_get_u8(tb[DEVLINK_ATTR_LINECARD_STATE]);
+	print_string(PRINT_ANY, "state", " state %s",
+		     linecard_state_name(state));
+	if (tb[DEVLINK_ATTR_LINECARD_TYPE])
+		print_string(PRINT_ANY, "type", " type %s",
+			     mnl_attr_get_str(tb[DEVLINK_ATTR_LINECARD_TYPE]));
+	pr_out_linecard_supported_types(dl, tb);
+	pr_out_handle_end(dl);
+}
+
+static int cmd_linecard_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct dl *dl = data;
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_LINECARD_INDEX] ||
+	    !tb[DEVLINK_ATTR_LINECARD_STATE])
+		return MNL_CB_ERROR;
+	pr_out_linecard(dl, tb);
+	return MNL_CB_OK;
+}
+
+static int cmd_linecard_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_LINECARD_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE,
+					DL_OPT_LINECARD);
+		if (err)
+			return err;
+	}
+
+	pr_out_section_start(dl, "lc");
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_linecard_show_cb, dl);
+	pr_out_section_end(dl);
+	return err;
+}
+
+static int cmd_linecard_provision(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_LINECARD_PROVISION,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_LINECARD |
+					 DL_OPT_LINECARD_TYPE, 0);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_linecard_unprovision(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_LINECARD_UNPROVISION,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_LINECARD, 0);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_linecard(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_linecard_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_linecard_show(dl);
+	} else if (dl_argv_match(dl, "provision")) {
+		dl_arg_inc(dl);
+		return cmd_linecard_provision(dl);
+	} else if (dl_argv_match(dl, "unprovision")) {
+		dl_arg_inc(dl);
+		return cmd_linecard_unprovision(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static void cmd_sb_help(void)
 {
 	pr_err("Usage: devlink sb show [ DEV [ sb SB_INDEX ] ]\n");
@@ -4818,6 +5003,10 @@ static const char *cmd_name(uint8_t cmd)
 	case DEVLINK_CMD_TRAP_POLICER_SET: return "set";
 	case DEVLINK_CMD_TRAP_POLICER_NEW: return "new";
 	case DEVLINK_CMD_TRAP_POLICER_DEL: return "del";
+	case DEVLINK_CMD_LINECARD_GET: return "get";
+	case DEVLINK_CMD_LINECARD_SET: return "set";
+	case DEVLINK_CMD_LINECARD_NEW: return "new";
+	case DEVLINK_CMD_LINECARD_DEL: return "del";
 	default: return "<unknown cmd>";
 	}
 }
@@ -4867,6 +5056,11 @@ static const char *cmd_obj(uint8_t cmd)
 	case DEVLINK_CMD_TRAP_POLICER_NEW:
 	case DEVLINK_CMD_TRAP_POLICER_DEL:
 		return "trap-policer";
+	case DEVLINK_CMD_LINECARD_GET:
+	case DEVLINK_CMD_LINECARD_SET:
+	case DEVLINK_CMD_LINECARD_NEW:
+	case DEVLINK_CMD_LINECARD_DEL:
+		return "lc";
 	default: return "<unknown obj>";
 	}
 }
@@ -5059,6 +5253,18 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 		pr_out_mon_header(genl->cmd);
 		pr_out_trap_policer(dl, tb, false);
 		break;
+	case DEVLINK_CMD_LINECARD_GET: /* fall through */
+	case DEVLINK_CMD_LINECARD_SET: /* fall through */
+	case DEVLINK_CMD_LINECARD_NEW: /* fall through */
+	case DEVLINK_CMD_LINECARD_DEL:
+		mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+		if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+		    !tb[DEVLINK_ATTR_LINECARD_INDEX])
+			return MNL_CB_ERROR;
+		pr_out_mon_header(genl->cmd);
+		pr_out_linecard(dl, tb);
+		pr_out_mon_footer();
+		break;
 	}
 	fflush(stdout);
 	return MNL_CB_OK;
@@ -5077,7 +5283,8 @@ static int cmd_mon_show(struct dl *dl)
 		    strcmp(cur_obj, "health") != 0 &&
 		    strcmp(cur_obj, "trap") != 0 &&
 		    strcmp(cur_obj, "trap-group") != 0 &&
-		    strcmp(cur_obj, "trap-policer") != 0) {
+		    strcmp(cur_obj, "trap-policer") != 0 &&
+		    strcmp(cur_obj, "lc") != 0) {
 			pr_err("Unknown object \"%s\"\n", cur_obj);
 			return -EINVAL;
 		}
@@ -5098,7 +5305,7 @@ static int cmd_mon_show(struct dl *dl)
 static void cmd_mon_help(void)
 {
 	pr_err("Usage: devlink monitor [ all | OBJECT-LIST ]\n"
-	       "where  OBJECT-LIST := { dev | port | health | trap | trap-group | trap-policer }\n");
+	       "where  OBJECT-LIST := { dev | port | lc | health | trap | trap-group | trap-policer }\n");
 }
 
 static int cmd_mon(struct dl *dl)
@@ -8073,7 +8280,7 @@ static void help(void)
 {
 	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       devlink [ -f[orce] ] -b[atch] filename -N[etns] netnsname\n"
-	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health | trap }\n"
+	       "where  OBJECT := { dev | port | lc | sb | monitor | dpipe | resource | region | health | trap }\n"
 	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] -s[tatistics] }\n");
 }
 
@@ -8112,6 +8319,9 @@ static int dl_cmd(struct dl *dl, int argc, char **argv)
 	} else if (dl_argv_match(dl, "trap")) {
 		dl_arg_inc(dl);
 		return cmd_trap(dl);
+	} else if (dl_argv_match(dl, "lc")) {
+		dl_arg_inc(dl);
+		return cmd_linecard(dl);
 	}
 	pr_err("Object \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (10 preceding siblings ...)
  2021-01-13 13:39 ` [patch iproute2/net-next RFC] devlink: add support for linecard show and provision Jiri Pirko
@ 2021-01-14  2:07 ` Andrew Lunn
  2021-01-14  7:39   ` Jiri Pirko
  2021-01-19 11:56   ` Jiri Pirko
  2021-01-14  2:27 ` Jakub Kicinski
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 80+ messages in thread
From: Andrew Lunn @ 2021-01-14  2:07 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
> $ devlink lc
> netdevsim/netdevsim10:
>   lc 0 state provisioned type card4ports
>     supported_types:
>        card1port card2ports card4ports
>   lc 1 state unprovisioned
>     supported_types:
>        card1port card2ports card4ports

Hi Jiri

> # Now activate the line card using debugfs. That emulates plug-in event
> # on real hardware:
> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
> $ ip link show eni10nl0p1
> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
> # The carrier is UP now.

What is missing from the devlink lc view is what line card is actually
in the slot. Say if i provision for a card4port, but actually insert a
card2port. It would be nice to have something like:

 $ devlink lc
 netdevsim/netdevsim10:
   lc 0 state provisioned type card4ports
     supported_types:
        card1port card2ports card4ports
     inserted_type:
        card2ports;
   lc 1 state unprovisioned
     supported_types:
        card1port card2ports card4ports
     inserted_type:
        None

I assume if i prevision for card4ports but actually install a
card2ports, all the interfaces stay down?

Maybe

> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active

should actually be
    echo "card2ports" > /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active

so you can emulate somebody putting the wrong card in the slot?

    Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (11 preceding siblings ...)
  2021-01-14  2:07 ` [patch net-next RFC 00/10] introduce line card support for modular switch Andrew Lunn
@ 2021-01-14  2:27 ` Jakub Kicinski
  2021-01-14  7:48   ` Jiri Pirko
  2021-01-14 22:58   ` Jacob Keller
  2021-01-15 15:43 ` Ido Schimmel
  2021-01-18 18:01 ` Edwin Peer
  14 siblings, 2 replies; 80+ messages in thread
From: Jakub Kicinski @ 2021-01-14  2:27 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

On Wed, 13 Jan 2021 13:12:12 +0100 Jiri Pirko wrote:
> This patchset introduces support for modular switch systems.
> NVIDIA Mellanox SN4800 is an example of such. It contains 8 slots
> to accomodate line cards. Available line cards include:
> 16X 100GbE (QSFP28)
> 8X 200GbE (QSFP56)
> 4X 400GbE (QSFP-DD)
> 
> Similar to split cabels, it is essencial for the correctness of
> configuration and funcionality to treat the line card entities
> in the same way, no matter the line card is inserted or not.
> Meaning, the netdevice of a line card port cannot just disappear
> when line card is removed. Also, system admin needs to be able
> to apply configuration on netdevices belonging to line card port
> even before the linecard gets inserted.

I don't understand why that would be. Please provide reasoning, 
e.g. what the FW/HW limitation is.

> To resolve this, a concept of "provisioning" is introduced.
> The user may "provision" certain slot with a line card type.
> Driver then creates all instances (devlink ports, netdevices, etc)
> related to this line card type. The carrier of netdevices stays down.
> Once the line card is inserted and activated, the carrier of the
> related netdevices goes up.

Dunno what "line card" means for Mellovidia but I don't think 
the analogy of port splitting works. To my knowledge traditional
line cards often carry processors w/ full MACs etc. so I'd say 
plugging in a line card is much more like plugging in a new NIC.

There is no way to tell a breakout cable from normal one, so the
system has no chance to magically configure itself. Besides SFP
is just plugging a cable, not a module of the system.. 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14  2:07 ` [patch net-next RFC 00/10] introduce line card support for modular switch Andrew Lunn
@ 2021-01-14  7:39   ` Jiri Pirko
  2021-01-14 22:56     ` Jacob Keller
  2021-01-19 11:56   ` Jiri Pirko
  1 sibling, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-14  7:39 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Thu, Jan 14, 2021 at 03:07:18AM CET, andrew@lunn.ch wrote:
>> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
>> $ devlink lc
>> netdevsim/netdevsim10:
>>   lc 0 state provisioned type card4ports
>>     supported_types:
>>        card1port card2ports card4ports
>>   lc 1 state unprovisioned
>>     supported_types:
>>        card1port card2ports card4ports
>
>Hi Jiri
>
>> # Now activate the line card using debugfs. That emulates plug-in event
>> # on real hardware:
>> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>> $ ip link show eni10nl0p1
>> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>> # The carrier is UP now.
>
>What is missing from the devlink lc view is what line card is actually
>in the slot. Say if i provision for a card4port, but actually insert a
>card2port. It would be nice to have something like:
>
> $ devlink lc
> netdevsim/netdevsim10:
>   lc 0 state provisioned type card4ports
>     supported_types:
>        card1port card2ports card4ports
>     inserted_type:
>        card2ports;
>   lc 1 state unprovisioned
>     supported_types:
>        card1port card2ports card4ports
>     inserted_type:
>        None

I see. Yes, that might be doable. I'm noting this down.


>
>I assume if i prevision for card4ports but actually install a
>card2ports, all the interfaces stay down?

Yes, the card won't get activated in case or provision mismatch.


>
>Maybe
>
>> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>
>should actually be
>    echo "card2ports" > /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>
>so you can emulate somebody putting the wrong card in the slot?

Got you.

Thanks!

>
>    Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14  2:27 ` Jakub Kicinski
@ 2021-01-14  7:48   ` Jiri Pirko
  2021-01-14 23:30     ` Jakub Kicinski
  2021-01-14 22:58   ` Jacob Keller
  1 sibling, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-14  7:48 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

Thu, Jan 14, 2021 at 03:27:16AM CET, kuba@kernel.org wrote:
>On Wed, 13 Jan 2021 13:12:12 +0100 Jiri Pirko wrote:
>> This patchset introduces support for modular switch systems.
>> NVIDIA Mellanox SN4800 is an example of such. It contains 8 slots
>> to accomodate line cards. Available line cards include:
>> 16X 100GbE (QSFP28)
>> 8X 200GbE (QSFP56)
>> 4X 400GbE (QSFP-DD)
>> 
>> Similar to split cabels, it is essencial for the correctness of
>> configuration and funcionality to treat the line card entities
>> in the same way, no matter the line card is inserted or not.
>> Meaning, the netdevice of a line card port cannot just disappear
>> when line card is removed. Also, system admin needs to be able
>> to apply configuration on netdevices belonging to line card port
>> even before the linecard gets inserted.
>
>I don't understand why that would be. Please provide reasoning, 
>e.g. what the FW/HW limitation is.

Well, for split cable, you need to be able to say:
port 2, split into 4. And you will have 4 netdevices. These netdevices
you can use to put into bridge, configure mtu, speeds, routes, etc.
These will exist no matter if the splitter cable is actually inserted or
not.

With linecards, this is very similar. By provisioning, you also create
certain number of ports, according to the linecard that you plan to
insert. And similarly to the splitter, the netdevices are created.

You may combine the linecard/splitter config when splitter cable is
connected to a linecard port. Then you provision a linecard,
port is going to appear and you will split this port.


>
>> To resolve this, a concept of "provisioning" is introduced.
>> The user may "provision" certain slot with a line card type.
>> Driver then creates all instances (devlink ports, netdevices, etc)
>> related to this line card type. The carrier of netdevices stays down.
>> Once the line card is inserted and activated, the carrier of the
>> related netdevices goes up.
>
>Dunno what "line card" means for Mellovidia but I don't think 
>the analogy of port splitting works. To my knowledge traditional
>line cards often carry processors w/ full MACs etc. so I'd say 
>plugging in a line card is much more like plugging in a new NIC.

No. It is basically a phy gearbox. The mac is not there. The interface
between asic and linecard are lanes. The linecards is basically an
attachable phy.


>
>There is no way to tell a breakout cable from normal one, so the
>system has no chance to magically configure itself. Besides SFP
>is just plugging a cable, not a module of the system.. 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14  7:39   ` Jiri Pirko
@ 2021-01-14 22:56     ` Jacob Keller
  2021-01-15 14:19       ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Jacob Keller @ 2021-01-14 22:56 UTC (permalink / raw)
  To: Jiri Pirko, Andrew Lunn; +Cc: netdev, davem, kuba, roopa, mlxsw



On 1/13/2021 11:39 PM, Jiri Pirko wrote:
> Thu, Jan 14, 2021 at 03:07:18AM CET, andrew@lunn.ch wrote:
>>
>> I assume if i prevision for card4ports but actually install a
>> card2ports, all the interfaces stay down?
> 
> Yes, the card won't get activated in case or provision mismatch.
> 

If you're able to detect the line card type when plugging it in, I don't
understand why you need system administrator to pre-provision it using
such an interface? Wouldn't it make more sense to simply detect the
case? Or is it that you expect these things to be moved around and want
to make sure that you can configure the associated netdevices before the
card is plugged in?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14  2:27 ` Jakub Kicinski
  2021-01-14  7:48   ` Jiri Pirko
@ 2021-01-14 22:58   ` Jacob Keller
  2021-01-14 23:20     ` Jakub Kicinski
  1 sibling, 1 reply; 80+ messages in thread
From: Jacob Keller @ 2021-01-14 22:58 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko; +Cc: netdev, davem, roopa, mlxsw



On 1/13/2021 6:27 PM, Jakub Kicinski wrote:
> On Wed, 13 Jan 2021 13:12:12 +0100 Jiri Pirko wrote:
>> This patchset introduces support for modular switch systems.
>> NVIDIA Mellanox SN4800 is an example of such. It contains 8 slots
>> to accomodate line cards. Available line cards include:
>> 16X 100GbE (QSFP28)
>> 8X 200GbE (QSFP56)
>> 4X 400GbE (QSFP-DD)
>>
>> Similar to split cabels, it is essencial for the correctness of
>> configuration and funcionality to treat the line card entities
>> in the same way, no matter the line card is inserted or not.
>> Meaning, the netdevice of a line card port cannot just disappear
>> when line card is removed. Also, system admin needs to be able
>> to apply configuration on netdevices belonging to line card port
>> even before the linecard gets inserted.
> 
> I don't understand why that would be. Please provide reasoning, 
> e.g. what the FW/HW limitation is.
> 

I agree, I wouldn't imagine that plugging or unplugging line cards is
expected to be done on a regular basis?

>> To resolve this, a concept of "provisioning" is introduced.
>> The user may "provision" certain slot with a line card type.
>> Driver then creates all instances (devlink ports, netdevices, etc)
>> related to this line card type. The carrier of netdevices stays down.
>> Once the line card is inserted and activated, the carrier of the
>> related netdevices goes up.
> 
> Dunno what "line card" means for Mellovidia but I don't think 
> the analogy of port splitting works. To my knowledge traditional
> line cards often carry processors w/ full MACs etc. so I'd say 
> plugging in a line card is much more like plugging in a new NIC.
> 

Even if they didn't...

> There is no way to tell a breakout cable from normal one, so the
> system has no chance to magically configure itself. Besides SFP
> is just plugging a cable, not a module of the system.. 
> 
If you're able to tell what is plugged in, why would we want to force
user to provision ahead of time? Wouldn't it make more sense to just
instantiate them as the card is plugged in? I guess it might be useful
to allow programming the netdevices before the cable is actually
inserted... I guess I don't see why that is valuable.

It would be sort of like if you provision a PCI slot before a device is
plugged into it..

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14 22:58   ` Jacob Keller
@ 2021-01-14 23:20     ` Jakub Kicinski
  2021-01-15 14:40       ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Jakub Kicinski @ 2021-01-14 23:20 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jiri Pirko, netdev, davem, roopa, mlxsw

On Thu, 14 Jan 2021 14:58:33 -0800 Jacob Keller wrote:
> > There is no way to tell a breakout cable from normal one, so the
> > system has no chance to magically configure itself. Besides SFP
> > is just plugging a cable, not a module of the system.. 
> >   
> If you're able to tell what is plugged in, why would we want to force
> user to provision ahead of time? Wouldn't it make more sense to just
> instantiate them as the card is plugged in? I guess it might be useful
> to allow programming the netdevices before the cable is actually
> inserted... I guess I don't see why that is valuable.
> 
> It would be sort of like if you provision a PCI slot before a device is
> plugged into it..

Yup, that's pretty much my thinking as well.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14  7:48   ` Jiri Pirko
@ 2021-01-14 23:30     ` Jakub Kicinski
  2021-01-15 14:39       ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Jakub Kicinski @ 2021-01-14 23:30 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

On Thu, 14 Jan 2021 08:48:04 +0100 Jiri Pirko wrote:
> Thu, Jan 14, 2021 at 03:27:16AM CET, kuba@kernel.org wrote:
> >On Wed, 13 Jan 2021 13:12:12 +0100 Jiri Pirko wrote:  
> >> This patchset introduces support for modular switch systems.
> >> NVIDIA Mellanox SN4800 is an example of such. It contains 8 slots
> >> to accomodate line cards. Available line cards include:
> >> 16X 100GbE (QSFP28)
> >> 8X 200GbE (QSFP56)
> >> 4X 400GbE (QSFP-DD)
> >> 
> >> Similar to split cabels, it is essencial for the correctness of
> >> configuration and funcionality to treat the line card entities
> >> in the same way, no matter the line card is inserted or not.
> >> Meaning, the netdevice of a line card port cannot just disappear
> >> when line card is removed. Also, system admin needs to be able
> >> to apply configuration on netdevices belonging to line card port
> >> even before the linecard gets inserted.  
> >
> >I don't understand why that would be. Please provide reasoning, 
> >e.g. what the FW/HW limitation is.  
> 
> Well, for split cable, you need to be able to say:
> port 2, split into 4. And you will have 4 netdevices. These netdevices
> you can use to put into bridge, configure mtu, speeds, routes, etc.
> These will exist no matter if the splitter cable is actually inserted or
> not.

The difference is that the line card is more detectable (I hope).

I'm not a SFP experts so maybe someone will correct me but AFAIU
the QSFP (for optics) is the same regardless of breakout. It's the
passive optical strands that are either bundled or not. So there is 
no way for the system to detect the cable type (AFAIK).

Or to put it differently IMO the netdev should be provisioned if the
system has a port into which user can plug in a cable. When there is 
a line card-sized hole in the chassis, I'd be surprised to see ports.

That said I never worked with real world routers so maybe that's what
they do. Maybe some with a Cisco router in the basement can tell us? :)

> With linecards, this is very similar. By provisioning, you also create
> certain number of ports, according to the linecard that you plan to
> insert. And similarly to the splitter, the netdevices are created.
> 
> You may combine the linecard/splitter config when splitter cable is
> connected to a linecard port. Then you provision a linecard,
> port is going to appear and you will split this port.
> 
> >> To resolve this, a concept of "provisioning" is introduced.
> >> The user may "provision" certain slot with a line card type.
> >> Driver then creates all instances (devlink ports, netdevices, etc)
> >> related to this line card type. The carrier of netdevices stays down.
> >> Once the line card is inserted and activated, the carrier of the
> >> related netdevices goes up.  
> >
> >Dunno what "line card" means for Mellovidia but I don't think 
> >the analogy of port splitting works. To my knowledge traditional
> >line cards often carry processors w/ full MACs etc. so I'd say 
> >plugging in a line card is much more like plugging in a new NIC.  
> 
> No. It is basically a phy gearbox. The mac is not there. The interface
> between asic and linecard are lanes. The linecards is basically an
> attachable phy.

If the device really needs this configuration / can't detect things
automatically, then we gotta do something like what you have.
The only question is do we still want to call it a line card.
Sounds more like a front panel module. At Netronome we called 
those phymods.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14 22:56     ` Jacob Keller
@ 2021-01-15 14:19       ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-15 14:19 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Andrew Lunn, netdev, davem, kuba, roopa, mlxsw

Thu, Jan 14, 2021 at 11:56:15PM CET, jacob.e.keller@intel.com wrote:
>
>
>On 1/13/2021 11:39 PM, Jiri Pirko wrote:
>> Thu, Jan 14, 2021 at 03:07:18AM CET, andrew@lunn.ch wrote:
>>>
>>> I assume if i prevision for card4ports but actually install a
>>> card2ports, all the interfaces stay down?
>> 
>> Yes, the card won't get activated in case or provision mismatch.
>> 
>
>If you're able to detect the line card type when plugging it in, I don't
>understand why you need system administrator to pre-provision it using
>such an interface? Wouldn't it make more sense to simply detect the
>case? Or is it that you expect these things to be moved around and want
>to make sure that you can configure the associated netdevices before the
>card is plugged in?

That is what I wrote in the cover letter.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14 23:30     ` Jakub Kicinski
@ 2021-01-15 14:39       ` Jiri Pirko
  2021-01-15 19:26         ` Jakub Kicinski
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-15 14:39 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 12:30:13AM CET, kuba@kernel.org wrote:
>On Thu, 14 Jan 2021 08:48:04 +0100 Jiri Pirko wrote:
>> Thu, Jan 14, 2021 at 03:27:16AM CET, kuba@kernel.org wrote:
>> >On Wed, 13 Jan 2021 13:12:12 +0100 Jiri Pirko wrote:  
>> >> This patchset introduces support for modular switch systems.
>> >> NVIDIA Mellanox SN4800 is an example of such. It contains 8 slots
>> >> to accomodate line cards. Available line cards include:
>> >> 16X 100GbE (QSFP28)
>> >> 8X 200GbE (QSFP56)
>> >> 4X 400GbE (QSFP-DD)
>> >> 
>> >> Similar to split cabels, it is essencial for the correctness of
>> >> configuration and funcionality to treat the line card entities
>> >> in the same way, no matter the line card is inserted or not.
>> >> Meaning, the netdevice of a line card port cannot just disappear
>> >> when line card is removed. Also, system admin needs to be able
>> >> to apply configuration on netdevices belonging to line card port
>> >> even before the linecard gets inserted.  
>> >
>> >I don't understand why that would be. Please provide reasoning, 
>> >e.g. what the FW/HW limitation is.  
>> 
>> Well, for split cable, you need to be able to say:
>> port 2, split into 4. And you will have 4 netdevices. These netdevices
>> you can use to put into bridge, configure mtu, speeds, routes, etc.
>> These will exist no matter if the splitter cable is actually inserted or
>> not.
>
>The difference is that the line card is more detectable (I hope).
>
>I'm not a SFP experts so maybe someone will correct me but AFAIU
>the QSFP (for optics) is the same regardless of breakout. It's the
>passive optical strands that are either bundled or not. So there is 
>no way for the system to detect the cable type (AFAIK).

For SFP module, you are able to detect those.

>
>Or to put it differently IMO the netdev should be provisioned if the
>system has a port into which user can plug in a cable. When there is 

Not really. For slit cables, the ports are provisioned not matter which
cable is connected, slitter 1->2/1->4 or 1->1 cable.


>a line card-sized hole in the chassis, I'd be surprised to see ports.
>
>That said I never worked with real world routers so maybe that's what
>they do. Maybe some with a Cisco router in the basement can tell us? :)

The need for provision/pre-configure splitter/linecard is that the
ports/netdevices do not disapper/reappear when you replace
splitter/linecard. Consider a faulty linecard with one port burned. You
just want to replace it with new one. And in that case, you really don't
want kernel to remove netdevices and possibly mess up routing for
example.


>
>> With linecards, this is very similar. By provisioning, you also create
>> certain number of ports, according to the linecard that you plan to
>> insert. And similarly to the splitter, the netdevices are created.
>> 
>> You may combine the linecard/splitter config when splitter cable is
>> connected to a linecard port. Then you provision a linecard,
>> port is going to appear and you will split this port.
>> 
>> >> To resolve this, a concept of "provisioning" is introduced.
>> >> The user may "provision" certain slot with a line card type.
>> >> Driver then creates all instances (devlink ports, netdevices, etc)
>> >> related to this line card type. The carrier of netdevices stays down.
>> >> Once the line card is inserted and activated, the carrier of the
>> >> related netdevices goes up.  
>> >
>> >Dunno what "line card" means for Mellovidia but I don't think 
>> >the analogy of port splitting works. To my knowledge traditional
>> >line cards often carry processors w/ full MACs etc. so I'd say 
>> >plugging in a line card is much more like plugging in a new NIC.  
>> 
>> No. It is basically a phy gearbox. The mac is not there. The interface
>> between asic and linecard are lanes. The linecards is basically an
>> attachable phy.
>
>If the device really needs this configuration / can't detect things
>automatically, then we gotta do something like what you have.
>The only question is do we still want to call it a line card.
>Sounds more like a front panel module. At Netronome we called 
>those phymods.

Sure, the name is up to the discussion. We call it "linecard"
internally. I don't care about the name.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14 23:20     ` Jakub Kicinski
@ 2021-01-15 14:40       ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-15 14:40 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jacob Keller, netdev, davem, roopa, mlxsw

Fri, Jan 15, 2021 at 12:20:58AM CET, kuba@kernel.org wrote:
>On Thu, 14 Jan 2021 14:58:33 -0800 Jacob Keller wrote:
>> > There is no way to tell a breakout cable from normal one, so the
>> > system has no chance to magically configure itself. Besides SFP
>> > is just plugging a cable, not a module of the system.. 
>> >   
>> If you're able to tell what is plugged in, why would we want to force
>> user to provision ahead of time? Wouldn't it make more sense to just
>> instantiate them as the card is plugged in? I guess it might be useful
>> to allow programming the netdevices before the cable is actually
>> inserted... I guess I don't see why that is valuable.
>> 
>> It would be sort of like if you provision a PCI slot before a device is
>> plugged into it..
>
>Yup, that's pretty much my thinking as well.

Please see my reply in the other sub-tread of this thread. Thanks!

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (12 preceding siblings ...)
  2021-01-14  2:27 ` Jakub Kicinski
@ 2021-01-15 15:43 ` Ido Schimmel
  2021-01-15 16:55   ` Jiri Pirko
  2021-01-18 18:01 ` Edwin Peer
  14 siblings, 1 reply; 80+ messages in thread
From: Ido Schimmel @ 2021-01-15 15:43 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Wed, Jan 13, 2021 at 01:12:12PM +0100, Jiri Pirko wrote:
> # Create a new netdevsim device, with no ports and 2 line cards:
> $ echo "10 0 2" >/sys/bus/netdevsim/new_device
> $ devlink port # No ports are listed
> $ devlink lc
> netdevsim/netdevsim10:
>   lc 0 state unprovisioned
>     supported_types:
>        card1port card2ports card4ports
>   lc 1 state unprovisioned
>     supported_types:
>        card1port card2ports card4ports
> 
> # Note that driver advertizes supported line card types. In case of
> # netdevsim, these are 3.
> 
> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports

Why do we need a separate command for that? You actually introduced
'DEVLINK_CMD_LINECARD_SET' in patch #1, but it's never used.

I prefer:

devlink lc set netdevsim/netdevsim10 index 0 state provision type card4ports
devlink lc set netdevsim/netdevsim10 index 0 state unprovision

It is consistent with the GET/SET/NEW/DEL pattern used by other
commands.

> $ devlink lc
> netdevsim/netdevsim10:
>   lc 0 state provisioned type card4ports
>     supported_types:
>        card1port card2ports card4ports
>   lc 1 state unprovisioned
>     supported_types:
>        card1port card2ports card4ports
> $ devlink port
> netdevsim/netdevsim10/1000: type eth netdev eni10nl0p1 flavour physical lc 0 port 1 splittable false
> netdevsim/netdevsim10/1001: type eth netdev eni10nl0p2 flavour physical lc 0 port 2 splittable false
> netdevsim/netdevsim10/1002: type eth netdev eni10nl0p3 flavour physical lc 0 port 3 splittable false
> netdevsim/netdevsim10/1003: type eth netdev eni10nl0p4 flavour physical lc 0 port 4 splittable false
> #                                                 ^^                    ^^^^
> #                                     netdev name adjusted          index of a line card this port belongs to
> 
> $ ip link set eni10nl0p1 up 
> $ ip link show eni10nl0p1   
> 165: eni10nl0p1: <NO-CARRIER,BROADCAST,NOARP,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
> 
> # Now activate the line card using debugfs. That emulates plug-in event
> # on real hardware:
> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
> $ ip link show eni10nl0p1
> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
> # The carrier is UP now.
> 
> Jiri Pirko (10):
>   devlink: add support to create line card and expose to user
>   devlink: implement line card provisioning
>   devlink: implement line card active state
>   devlink: append split port number to the port name
>   devlink: add port to line card relationship set
>   netdevsim: introduce line card support
>   netdevsim: allow port objects to be linked with line cards
>   netdevsim: create devlink line card object and implement provisioning
>   netdevsim: implement line card activation
>   selftests: add netdevsim devlink lc test
> 
>  drivers/net/netdevsim/bus.c                   |  21 +-
>  drivers/net/netdevsim/dev.c                   | 370 ++++++++++++++-
>  drivers/net/netdevsim/netdev.c                |   2 +
>  drivers/net/netdevsim/netdevsim.h             |  23 +
>  include/net/devlink.h                         |  44 ++
>  include/uapi/linux/devlink.h                  |  25 +
>  net/core/devlink.c                            | 443 +++++++++++++++++-
>  .../drivers/net/netdevsim/devlink.sh          |  62 ++-
>  8 files changed, 964 insertions(+), 26 deletions(-)
> 
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 01/10] devlink: add support to create line card and expose to user
  2021-01-13 12:12 ` [patch net-next RFC 01/10] devlink: add support to create line card and expose to user Jiri Pirko
@ 2021-01-15 15:47   ` Ido Schimmel
  0 siblings, 0 replies; 80+ messages in thread
From: Ido Schimmel @ 2021-01-15 15:47 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Wed, Jan 13, 2021 at 01:12:13PM +0100, Jiri Pirko wrote:
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index cf89c318f2ac..e5ed0522591f 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -126,6 +126,11 @@ enum devlink_command {
>  
>  	DEVLINK_CMD_HEALTH_REPORTER_TEST,
>  
> +	DEVLINK_CMD_LINECARD_GET,		/* can dump */
> +	DEVLINK_CMD_LINECARD_SET,

Never used (but should)

> +	DEVLINK_CMD_LINECARD_NEW,
> +	DEVLINK_CMD_LINECARD_DEL,
> +
>  	/* add new commands above here */
>  	__DEVLINK_CMD_MAX,
>  	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
> @@ -529,6 +534,8 @@ enum devlink_attr {
>  	DEVLINK_ATTR_RELOAD_ACTION_INFO,        /* nested */
>  	DEVLINK_ATTR_RELOAD_ACTION_STATS,       /* nested */
>  
> +	DEVLINK_ATTR_LINECARD_INDEX,		/* u32 */
> +
>  	/* add new attributes above here, update the policy in devlink.c */
>  
>  	__DEVLINK_ATTR_MAX,

[...]

>  
> +/**
> + *	devlink_linecard_register - Register devlink linecard

Does not match function name

> + *
> + *	@devlink: devlink
> + *	@devlink_linecard: devlink linecard
> + *	@linecard_index: driver-specific numerical identifier of the linecard
> + *
> + *	Create devlink linecard instance with provided linecard index.
> + *	Caller can use any indexing, even hw-related one.
> + */
> +struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
> +						 unsigned int linecard_index)
> +{
> +	struct devlink_linecard *linecard;
> +
> +	mutex_lock(&devlink->lock);
> +	if (devlink_linecard_index_exists(devlink, linecard_index)) {
> +		mutex_unlock(&devlink->lock);
> +		return ERR_PTR(-EEXIST);
> +	}
> +
> +	linecard = kzalloc(sizeof(*linecard), GFP_KERNEL);
> +	if (!linecard)
> +		return ERR_PTR(-ENOMEM);
> +
> +	linecard->devlink = devlink;
> +	linecard->index = linecard_index;
> +	list_add_tail(&linecard->list, &devlink->linecard_list);
> +	mutex_unlock(&devlink->lock);
> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
> +	return linecard;
> +}
> +EXPORT_SYMBOL_GPL(devlink_linecard_create);
> +
> +/**
> + *	devlink_linecard_destroy - Destroy devlink linecard
> + *
> + *	@devlink_linecard: devlink linecard
> + */
> +void devlink_linecard_destroy(struct devlink_linecard *linecard)
> +{
> +	struct devlink *devlink = linecard->devlink;
> +
> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_DEL);
> +	mutex_lock(&devlink->lock);
> +	list_del(&linecard->list);
> +	mutex_unlock(&devlink->lock);
> +}
> +EXPORT_SYMBOL_GPL(devlink_linecard_create);
> +
>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>  			u32 size, u16 ingress_pools_count,
>  			u16 egress_pools_count, u16 ingress_tc_count,
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 02/10] devlink: implement line card provisioning
  2021-01-13 12:12 ` [patch net-next RFC 02/10] devlink: implement line card provisioning Jiri Pirko
@ 2021-01-15 16:03   ` Ido Schimmel
  2021-01-15 16:51     ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Ido Schimmel @ 2021-01-15 16:03 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Wed, Jan 13, 2021 at 01:12:14PM +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
> 
> In order to be able to configure all needed stuff on a port/netdevice
> of a line card without the line card being present, introduce line card
> provisioning. Basically provisioning will create a placeholder for
> instances (ports/netdevices) for a line card type.
> 
> Allow the user to query the supported line card types over line card
> get command. Then implement two netlink commands to allow user to
> provision/unprovision the line card with selected line card type.
> 
> On the driver API side, add provision/unprovision ops and supported
> types array to be advertised. Upon provision op call, the driver should
> take care of creating the instances for the particular line card type.
> Introduce provision_set/clear() functions to be called by the driver
> once the provisioning/unprovisioning is done on its side.
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> ---
>  include/net/devlink.h        |  31 +++++++-
>  include/uapi/linux/devlink.h |  17 +++++
>  net/core/devlink.c           | 141 ++++++++++++++++++++++++++++++++++-
>  3 files changed, 185 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index 67c2547d5ef9..854abd53e9ea 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -139,10 +139,33 @@ struct devlink_port {
>  	struct mutex reporters_lock; /* Protects reporter_list */
>  };
>  
> +struct devlink_linecard_ops;
> +
>  struct devlink_linecard {
>  	struct list_head list;
>  	struct devlink *devlink;
>  	unsigned int index;
> +	const struct devlink_linecard_ops *ops;
> +	void *priv;
> +	enum devlink_linecard_state state;
> +	const char *provisioned_type;
> +};
> +
> +/**
> + * struct devlink_linecard_ops - Linecard operations
> + * @supported_types: array of supported types of linecards
> + * @supported_types_count: number of elements in the array above
> + * @provision: callback to provision the linecard slot with certain
> + *	       type of linecard
> + * @unprovision: callback to unprovision the linecard slot
> + */
> +struct devlink_linecard_ops {
> +	const char **supported_types;
> +	unsigned int supported_types_count;
> +	int (*provision)(struct devlink_linecard *linecard, void *priv,
> +			 u32 type_index, struct netlink_ext_ack *extack);
> +	int (*unprovision)(struct devlink_linecard *linecard, void *priv,
> +			   struct netlink_ext_ack *extack);
>  };
>  
>  struct devlink_sb_pool_info {
> @@ -1414,9 +1437,13 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
>  				   u16 pf, bool external);
>  void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
>  				   u16 pf, u16 vf, bool external);
> -struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
> -						 unsigned int linecard_index);
> +struct devlink_linecard *
> +devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
> +			const struct devlink_linecard_ops *ops, void *priv);
>  void devlink_linecard_destroy(struct devlink_linecard *linecard);
> +void devlink_linecard_provision_set(struct devlink_linecard *linecard,
> +				    u32 type_index);
> +void devlink_linecard_provision_clear(struct devlink_linecard *linecard);
>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>  			u32 size, u16 ingress_pools_count,
>  			u16 egress_pools_count, u16 ingress_tc_count,
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index e5ed0522591f..4111ddcc000b 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -131,6 +131,9 @@ enum devlink_command {
>  	DEVLINK_CMD_LINECARD_NEW,
>  	DEVLINK_CMD_LINECARD_DEL,
>  
> +	DEVLINK_CMD_LINECARD_PROVISION,
> +	DEVLINK_CMD_LINECARD_UNPROVISION,

I do not really see the point in these two commands. Better extend
DEVLINK_CMD_LINECARD_SET to carry these attributes.

> +
>  	/* add new commands above here */
>  	__DEVLINK_CMD_MAX,
>  	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
> @@ -329,6 +332,17 @@ enum devlink_reload_limit {
>  
>  #define DEVLINK_RELOAD_LIMITS_VALID_MASK (_BITUL(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
>  
> +enum devlink_linecard_state {
> +	DEVLINK_LINECARD_STATE_UNSPEC,
> +	DEVLINK_LINECARD_STATE_UNPROVISIONED,
> +	DEVLINK_LINECARD_STATE_UNPROVISIONING,
> +	DEVLINK_LINECARD_STATE_PROVISIONING,

Can you explain why these two states are necessary? Any reason the
provision operation can't be synchronous? This is somewhat explained in
patch #8, but it should really be explained here. Changelog says:

"To avoid deadlock and to mimic actual HW flow, use workqueue
to add/del ports during provisioning as the port add/del calls
devlink_port_register/unregister() which take devlink mutex."

The deadlock is not really a reason to have these states.
'DEVLINK_CMD_PORT_SPLIT' also calls devlink_port_register() /
devlink_port_unregister() and the deadlock is solved by:

'internal_flags = DEVLINK_NL_FLAG_NO_LOCK'

A hardware flow the requires it is something else...

> +	DEVLINK_LINECARD_STATE_PROVISIONED,
> +
> +	__DEVLINK_LINECARD_STATE_MAX,
> +	DEVLINK_LINECARD_STATE_MAX = __DEVLINK_LINECARD_STATE_MAX - 1
> +};
> +
>  enum devlink_attr {
>  	/* don't change the order or add anything between, this is ABI! */
>  	DEVLINK_ATTR_UNSPEC,
> @@ -535,6 +549,9 @@ enum devlink_attr {
>  	DEVLINK_ATTR_RELOAD_ACTION_STATS,       /* nested */
>  
>  	DEVLINK_ATTR_LINECARD_INDEX,		/* u32 */
> +	DEVLINK_ATTR_LINECARD_STATE,		/* u8 */
> +	DEVLINK_ATTR_LINECARD_TYPE,		/* string */
> +	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
>  
>  	/* add new attributes above here, update the policy in devlink.c */
>  
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 564e921133cf..434eecc310c3 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -1192,7 +1192,9 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
>  				    u32 seq, int flags,
>  				    struct netlink_ext_ack *extack)
>  {
> +	struct nlattr *attr;
>  	void *hdr;
> +	int i;
>  
>  	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
>  	if (!hdr)
> @@ -1202,6 +1204,22 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
>  		goto nla_put_failure;
>  	if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index))
>  		goto nla_put_failure;
> +	if (nla_put_u8(msg, DEVLINK_ATTR_LINECARD_STATE, linecard->state))
> +		goto nla_put_failure;
> +	if (linecard->state >= DEVLINK_LINECARD_STATE_PROVISIONED &&

This assumes that every state added after provisioned should report the
type. Better to check for the specific states

> +	    nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE,
> +			   linecard->provisioned_type))
> +		goto nla_put_failure;
> +
> +	attr = nla_nest_start(msg, DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES);
> +	if (!attr)
> +		return -EMSGSIZE;
> +	for (i = 0; i < linecard->ops->supported_types_count; i++) {
> +		if (nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE,
> +				   linecard->ops->supported_types[i]))
> +			goto nla_put_failure;
> +	}
> +	nla_nest_end(msg, attr);
>  
>  	genlmsg_end(msg, hdr);
>  	return 0;
> @@ -1300,6 +1318,68 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg,
>  	return msg->len;
>  }
>  
> +static int devlink_nl_cmd_linecard_provision_doit(struct sk_buff *skb,
> +						  struct genl_info *info)
> +{
> +	struct devlink_linecard *linecard = info->user_ptr[1];
> +	const char *type;
> +	int i;
> +
> +	if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) {
> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being provisioned");
> +		return -EBUSY;
> +	}
> +	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) {
> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being unprovisioned");
> +		return -EBUSY;
> +	}
> +	if (linecard->state != DEVLINK_LINECARD_STATE_UNPROVISIONED) {
> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard already provisioned");
> +		return -EBUSY;
> +	}
> +
> +	if (!info->attrs[DEVLINK_ATTR_LINECARD_TYPE]) {
> +		NL_SET_ERR_MSG_MOD(info->extack, "Provision type not provided");
> +		return -EINVAL;
> +	}
> +
> +	type = nla_data(info->attrs[DEVLINK_ATTR_LINECARD_TYPE]);
> +	for (i = 0; i < linecard->ops->supported_types_count; i++) {
> +		if (!strcmp(linecard->ops->supported_types[i], type)) {
> +			linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING;
> +			devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
> +			return linecard->ops->provision(linecard,
> +							linecard->priv, i,
> +							info->extack);

So if this fails user space will see 'provisioning' although nothing is
being provisioned... Better to set the state and notify if this call did
not fail

> +		}
> +	}
> +	NL_SET_ERR_MSG_MOD(info->extack, "Unsupported provision type provided");
> +	return -EINVAL;
> +}
> +
> +static int devlink_nl_cmd_linecard_unprovision_doit(struct sk_buff *skb,
> +						    struct genl_info *info)
> +{
> +	struct devlink_linecard *linecard = info->user_ptr[1];
> +
> +	if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) {
> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being provisioned");
> +		return -EBUSY;
> +	}
> +	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) {
> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being unprovisioned");
> +		return -EBUSY;
> +	}
> +	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONED) {
> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is not provisioned");
> +		return -EOPNOTSUPP;
> +	}
> +	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONING;
> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
> +	return linecard->ops->unprovision(linecard, linecard->priv,
> +					  info->extack);
> +}
> +
>  static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink,
>  			      struct devlink_sb *devlink_sb,
>  			      enum devlink_command cmd, u32 portid,
> @@ -7759,6 +7839,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>  							DEVLINK_RELOAD_ACTION_MAX),
>  	[DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK),
>  	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
> +	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
>  };
>  
>  static const struct genl_small_ops devlink_nl_ops[] = {
> @@ -7806,6 +7887,20 @@ static const struct genl_small_ops devlink_nl_ops[] = {
>  		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
>  		/* can be retrieved by unprivileged users */
>  	},
> +	{
> +		.cmd = DEVLINK_CMD_LINECARD_PROVISION,
> +		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> +		.doit = devlink_nl_cmd_linecard_provision_doit,
> +		.flags = GENL_ADMIN_PERM,
> +		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
> +	},
> +	{
> +		.cmd = DEVLINK_CMD_LINECARD_UNPROVISION,
> +		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> +		.doit = devlink_nl_cmd_linecard_unprovision_doit,
> +		.flags = GENL_ADMIN_PERM,
> +		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
> +	},
>  	{
>  		.cmd = DEVLINK_CMD_SB_GET,
>  		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> @@ -8613,11 +8708,17 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>   *	Create devlink linecard instance with provided linecard index.
>   *	Caller can use any indexing, even hw-related one.
>   */
> -struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
> -						 unsigned int linecard_index)
> +struct devlink_linecard *
> +devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
> +			const struct devlink_linecard_ops *ops, void *priv)
>  {
>  	struct devlink_linecard *linecard;
>  
> +	if (WARN_ON(!ops || !ops->supported_types ||
> +		    !ops->supported_types_count ||
> +		    !ops->provision || !ops->unprovision))
> +		return ERR_PTR(-EINVAL);
> +
>  	mutex_lock(&devlink->lock);
>  	if (devlink_linecard_index_exists(devlink, linecard_index)) {
>  		mutex_unlock(&devlink->lock);
> @@ -8630,6 +8731,9 @@ struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
>  
>  	linecard->devlink = devlink;
>  	linecard->index = linecard_index;
> +	linecard->ops = ops;
> +	linecard->priv = priv;
> +	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED;
>  	list_add_tail(&linecard->list, &devlink->linecard_list);
>  	mutex_unlock(&devlink->lock);
>  	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
> @@ -8653,6 +8757,39 @@ void devlink_linecard_destroy(struct devlink_linecard *linecard)
>  }
>  EXPORT_SYMBOL_GPL(devlink_linecard_create);
>  
> +/**
> + *	devlink_linecard_provision_set - Set provisioning on linecard

'Set linecard as provisioned' maybe?

> + *
> + *	@devlink_linecard: devlink linecard
> + *	@type_index: index of the linecard type (in array of types in ops)
> + */
> +void devlink_linecard_provision_set(struct devlink_linecard *linecard,
> +				    u32 type_index)
> +{
> +	WARN_ON(type_index >= linecard->ops->supported_types_count);

Wouldn't this explode below when you use the index to access the array?
Maybe better to just warn and return

> +	mutex_lock(&linecard->devlink->lock);
> +	linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED;
> +	linecard->provisioned_type = linecard->ops->supported_types[type_index];
> +	mutex_unlock(&linecard->devlink->lock);
> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
> +}
> +EXPORT_SYMBOL_GPL(devlink_linecard_provision_set);
> +
> +/**
> + *	devlink_linecard_provision_clear - Clear provisioning on linecard

'Set linecard as unprovisioned' maybe?

> + *
> + *	@devlink_linecard: devlink linecard
> + */
> +void devlink_linecard_provision_clear(struct devlink_linecard *linecard)
> +{
> +	mutex_lock(&linecard->devlink->lock);
> +	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED;
> +	linecard->provisioned_type = NULL;
> +	mutex_unlock(&linecard->devlink->lock);
> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
> +}
> +EXPORT_SYMBOL_GPL(devlink_linecard_provision_clear);
> +
>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>  			u32 size, u16 ingress_pools_count,
>  			u16 egress_pools_count, u16 ingress_tc_count,
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 03/10] devlink: implement line card active state
  2021-01-13 12:12 ` [patch net-next RFC 03/10] devlink: implement line card active state Jiri Pirko
@ 2021-01-15 16:06   ` Ido Schimmel
  2021-01-15 16:52     ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Ido Schimmel @ 2021-01-15 16:06 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Wed, Jan 13, 2021 at 01:12:15PM +0100, Jiri Pirko wrote:
> +/**
> + *	devlink_linecard_deactivate - Set linecard deactive

Set linecard as inactive

> + *
> + *	@devlink_linecard: devlink linecard
> + */
> +void devlink_linecard_deactivate(struct devlink_linecard *linecard)
> +{
> +	mutex_lock(&linecard->devlink->lock);
> +	WARN_ON(linecard->state != DEVLINK_LINECARD_STATE_ACTIVE);
> +	linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED;
> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
> +	mutex_unlock(&linecard->devlink->lock);
> +}
> +EXPORT_SYMBOL_GPL(devlink_linecard_deactivate);
> +
> +/**
> + *	devlink_linecard_is_active - Check if active
> + *
> + *	@devlink_linecard: devlink linecard
> + */
> +bool devlink_linecard_is_active(struct devlink_linecard *linecard)
> +{
> +	bool active;
> +
> +	mutex_lock(&linecard->devlink->lock);
> +	active = linecard->state == DEVLINK_LINECARD_STATE_ACTIVE;
> +	mutex_unlock(&linecard->devlink->lock);
> +	return active;
> +}
> +EXPORT_SYMBOL_GPL(devlink_linecard_is_active);
> +
>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>  			u32 size, u16 ingress_pools_count,
>  			u16 egress_pools_count, u16 ingress_tc_count,
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 05/10] devlink: add port to line card relationship set
  2021-01-13 12:12 ` [patch net-next RFC 05/10] devlink: add port to line card relationship set Jiri Pirko
@ 2021-01-15 16:10   ` Ido Schimmel
  2021-01-15 16:53     ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Ido Schimmel @ 2021-01-15 16:10 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Wed, Jan 13, 2021 at 01:12:17PM +0100, Jiri Pirko wrote:
> index ec00cd94c626..cb911b6fdeda 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -137,6 +137,7 @@ struct devlink_port {
>  	struct delayed_work type_warn_dw;
>  	struct list_head reporter_list;
>  	struct mutex reporters_lock; /* Protects reporter_list */
> +	struct devlink_linecard *linecard;
>  };
>  
>  struct devlink_linecard_ops;
> @@ -1438,6 +1439,8 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
>  				   u16 pf, bool external);
>  void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
>  				   u16 pf, u16 vf, bool external);
> +void devlink_port_linecard_set(struct devlink_port *devlink_port,
> +			       struct devlink_linecard *linecard);
>  struct devlink_linecard *
>  devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
>  			const struct devlink_linecard_ops *ops, void *priv);
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 347976b88404..2faa30cc5cce 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -855,6 +855,10 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
>  		goto nla_put_failure;
>  	if (devlink_nl_port_function_attrs_put(msg, devlink_port, extack))
>  		goto nla_put_failure;
> +	if (devlink_port->linecard &&
> +	    nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX,
> +			devlink_port->linecard->index))
> +		goto nla_put_failure;
>  
>  	genlmsg_end(msg, hdr);
>  	return 0;
> @@ -8642,6 +8646,21 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
>  }
>  EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
>  
> +/**
> + *	devlink_port_linecard_set - Link port with a linecard
> + *
> + *	@devlink_port: devlink port
> + *	@devlink_linecard: devlink linecard
> + */
> +void devlink_port_linecard_set(struct devlink_port *devlink_port,
> +			       struct devlink_linecard *linecard)
> +{
> +	if (WARN_ON(devlink_port->registered))
> +		return;
> +	devlink_port->linecard = linecard;

We already have devlink_port_attrs_set() that is called before the port
is registered, why not extend it to also set the linecard information?

> +}
> +EXPORT_SYMBOL_GPL(devlink_port_linecard_set);
> +
>  static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>  					     char *name, size_t len)
>  {
> @@ -8654,7 +8673,11 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>  	switch (attrs->flavour) {
>  	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
>  	case DEVLINK_PORT_FLAVOUR_VIRTUAL:
> -		n = snprintf(name, len, "p%u", attrs->phys.port_number);
> +		if (devlink_port->linecard)
> +			n = snprintf(name, len, "l%u",
> +				     devlink_port->linecard->index);
> +		n += snprintf(name + n, len - n, "p%u",
> +			      attrs->phys.port_number);
>  		if (attrs->split)
>  			n += snprintf(name + n, len - n, "s%u",
>  				      attrs->phys.split_subport_number);
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 08/10] netdevsim: create devlink line card object and implement provisioning
  2021-01-13 12:12 ` [patch net-next RFC 08/10] netdevsim: create devlink line card object and implement provisioning Jiri Pirko
@ 2021-01-15 16:30   ` Ido Schimmel
  2021-01-15 16:54     ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Ido Schimmel @ 2021-01-15 16:30 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Wed, Jan 13, 2021 at 01:12:20PM +0100, Jiri Pirko wrote:
> @@ -977,6 +1012,9 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
>  	memcpy(attrs.switch_id.id, nsim_dev->switch_id.id, nsim_dev->switch_id.id_len);
>  	attrs.switch_id.id_len = nsim_dev->switch_id.id_len;
>  	devlink_port_attrs_set(devlink_port, &attrs);
> +	if (nsim_dev_linecard)
> +		devlink_port_linecard_set(devlink_port,
> +					  nsim_dev_linecard->devlink_linecard);

Should be fold into devlink_port_attrs_set()

>  	err = devlink_port_register(priv_to_devlink(nsim_dev), devlink_port,
>  				    nsim_dev_port->port_index);
>  	if (err)
> @@ -1053,10 +1091,88 @@ static int nsim_dev_port_add_all(struct nsim_dev *nsim_dev,
>  	return err;
>  }
>  
> +static void nsim_dev_linecard_provision_work(struct work_struct *work)
> +{
> +	struct nsim_dev_linecard *nsim_dev_linecard;
> +	struct nsim_bus_dev *nsim_bus_dev;
> +	int err;
> +	int i;
> +
> +	nsim_dev_linecard = container_of(work, struct nsim_dev_linecard,
> +					 provision_work);
> +
> +	nsim_bus_dev = nsim_dev_linecard->nsim_dev->nsim_bus_dev;
> +	for (i = 0; i < nsim_dev_linecard_port_count(nsim_dev_linecard); i++) {
> +		err = nsim_dev_port_add(nsim_bus_dev, nsim_dev_linecard, i);
> +		if (err)
> +			goto err_port_del_all;
> +	}
> +	nsim_dev_linecard->provisioned = true;
> +	devlink_linecard_provision_set(nsim_dev_linecard->devlink_linecard,
> +				       nsim_dev_linecard->type_index);
> +	return;
> +
> +err_port_del_all:
> +	for (i--; i >= 0; i--)
> +		nsim_dev_port_del(nsim_bus_dev, nsim_dev_linecard, i);
> +	devlink_linecard_provision_clear(nsim_dev_linecard->devlink_linecard);
> +}
> +
> +static int nsim_dev_linecard_provision(struct devlink_linecard *linecard,
> +				       void *priv, u32 type_index,
> +				       struct netlink_ext_ack *extack)
> +{
> +	struct nsim_dev_linecard *nsim_dev_linecard = priv;
> +
> +	nsim_dev_linecard->type_index = type_index;
> +	INIT_WORK(&nsim_dev_linecard->provision_work,
> +		  nsim_dev_linecard_provision_work);
> +	schedule_work(&nsim_dev_linecard->provision_work);
> +
> +	return 0;
> +}
> +
> +static void nsim_dev_linecard_unprovision_work(struct work_struct *work)
> +{
> +	struct nsim_dev_linecard *nsim_dev_linecard;
> +	struct nsim_bus_dev *nsim_bus_dev;
> +	int i;
> +
> +	nsim_dev_linecard = container_of(work, struct nsim_dev_linecard,
> +					 provision_work);
> +
> +	nsim_bus_dev = nsim_dev_linecard->nsim_dev->nsim_bus_dev;
> +	nsim_dev_linecard->provisioned = false;
> +	devlink_linecard_provision_clear(nsim_dev_linecard->devlink_linecard);
> +	for (i = 0; i < nsim_dev_linecard_port_count(nsim_dev_linecard); i++)
> +		nsim_dev_port_del(nsim_bus_dev, nsim_dev_linecard, i);
> +}
> +
> +static int nsim_dev_linecard_unprovision(struct devlink_linecard *linecard,
> +					 void *priv,
> +					 struct netlink_ext_ack *extack)
> +{
> +	struct nsim_dev_linecard *nsim_dev_linecard = priv;
> +
> +	INIT_WORK(&nsim_dev_linecard->provision_work,
> +		  nsim_dev_linecard_unprovision_work);
> +	schedule_work(&nsim_dev_linecard->provision_work);
> +
> +	return 0;
> +}
> +
> +static const struct devlink_linecard_ops nsim_dev_linecard_ops = {
> +	.supported_types = nsim_dev_linecard_supported_types,
> +	.supported_types_count = ARRAY_SIZE(nsim_dev_linecard_supported_types),
> +	.provision = nsim_dev_linecard_provision,
> +	.unprovision = nsim_dev_linecard_unprovision,
> +};
> +
>  static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
>  				   unsigned int linecard_index)
>  {
>  	struct nsim_dev_linecard *nsim_dev_linecard;
> +	struct devlink_linecard *devlink_linecard;
>  	int err;
>  
>  	nsim_dev_linecard = kzalloc(sizeof(*nsim_dev_linecard), GFP_KERNEL);
> @@ -1066,14 +1182,27 @@ static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
>  	nsim_dev_linecard->linecard_index = linecard_index;
>  	INIT_LIST_HEAD(&nsim_dev_linecard->port_list);
>  
> +	devlink_linecard = devlink_linecard_create(priv_to_devlink(nsim_dev),
> +						   linecard_index,
> +						   &nsim_dev_linecard_ops,
> +						   nsim_dev_linecard);
> +	if (IS_ERR(devlink_linecard)) {
> +		err = PTR_ERR(devlink_linecard);
> +		goto err_linecard_free;
> +	}
> +
> +	nsim_dev_linecard->devlink_linecard = devlink_linecard;
> +
>  	err = nsim_dev_linecard_debugfs_init(nsim_dev, nsim_dev_linecard);
>  	if (err)
> -		goto err_linecard_free;
> +		goto err_dl_linecard_destroy;
>  
>  	list_add(&nsim_dev_linecard->list, &nsim_dev->linecard_list);
>  
>  	return 0;
>  
> +err_dl_linecard_destroy:
> +	devlink_linecard_destroy(devlink_linecard);
>  err_linecard_free:
>  	kfree(nsim_dev_linecard);
>  	return err;
> @@ -1081,8 +1210,12 @@ static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
>  
>  static void __nsim_dev_linecard_del(struct nsim_dev_linecard *nsim_dev_linecard)
>  {
> +	struct devlink_linecard *devlink_linecard =
> +					nsim_dev_linecard->devlink_linecard;
> +
>  	list_del(&nsim_dev_linecard->list);
>  	nsim_dev_linecard_debugfs_exit(nsim_dev_linecard);

What about the delayed work? I believe it can run while you are
destroying the linecard, so cancel_delayed_work_sync() is needed

> +	devlink_linecard_destroy(devlink_linecard);
>  	kfree(nsim_dev_linecard);
>  }
>  
> diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
> index 88b61b9390bf..ab217b361416 100644
> --- a/drivers/net/netdevsim/netdevsim.h
> +++ b/drivers/net/netdevsim/netdevsim.h
> @@ -196,10 +196,14 @@ struct nsim_dev;
>  
>  struct nsim_dev_linecard {
>  	struct list_head list;
> +	struct devlink_linecard *devlink_linecard;
>  	struct nsim_dev *nsim_dev;
>  	struct list_head port_list;
>  	unsigned int linecard_index;
>  	struct dentry *ddir;
> +	bool provisioned;
> +	u32 type_index;
> +	struct work_struct provision_work;
>  };
>  
>  struct nsim_dev {
> -- 
> 2.26.2
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 02/10] devlink: implement line card provisioning
  2021-01-15 16:03   ` Ido Schimmel
@ 2021-01-15 16:51     ` Jiri Pirko
  2021-01-15 18:09       ` Ido Schimmel
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-15 16:51 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 05:03:19PM CET, idosch@idosch.org wrote:
>On Wed, Jan 13, 2021 at 01:12:14PM +0100, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@nvidia.com>
>> 
>> In order to be able to configure all needed stuff on a port/netdevice
>> of a line card without the line card being present, introduce line card
>> provisioning. Basically provisioning will create a placeholder for
>> instances (ports/netdevices) for a line card type.
>> 
>> Allow the user to query the supported line card types over line card
>> get command. Then implement two netlink commands to allow user to
>> provision/unprovision the line card with selected line card type.
>> 
>> On the driver API side, add provision/unprovision ops and supported
>> types array to be advertised. Upon provision op call, the driver should
>> take care of creating the instances for the particular line card type.
>> Introduce provision_set/clear() functions to be called by the driver
>> once the provisioning/unprovisioning is done on its side.
>> 
>> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>> ---
>>  include/net/devlink.h        |  31 +++++++-
>>  include/uapi/linux/devlink.h |  17 +++++
>>  net/core/devlink.c           | 141 ++++++++++++++++++++++++++++++++++-
>>  3 files changed, 185 insertions(+), 4 deletions(-)
>> 
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index 67c2547d5ef9..854abd53e9ea 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -139,10 +139,33 @@ struct devlink_port {
>>  	struct mutex reporters_lock; /* Protects reporter_list */
>>  };
>>  
>> +struct devlink_linecard_ops;
>> +
>>  struct devlink_linecard {
>>  	struct list_head list;
>>  	struct devlink *devlink;
>>  	unsigned int index;
>> +	const struct devlink_linecard_ops *ops;
>> +	void *priv;
>> +	enum devlink_linecard_state state;
>> +	const char *provisioned_type;
>> +};
>> +
>> +/**
>> + * struct devlink_linecard_ops - Linecard operations
>> + * @supported_types: array of supported types of linecards
>> + * @supported_types_count: number of elements in the array above
>> + * @provision: callback to provision the linecard slot with certain
>> + *	       type of linecard
>> + * @unprovision: callback to unprovision the linecard slot
>> + */
>> +struct devlink_linecard_ops {
>> +	const char **supported_types;
>> +	unsigned int supported_types_count;
>> +	int (*provision)(struct devlink_linecard *linecard, void *priv,
>> +			 u32 type_index, struct netlink_ext_ack *extack);
>> +	int (*unprovision)(struct devlink_linecard *linecard, void *priv,
>> +			   struct netlink_ext_ack *extack);
>>  };
>>  
>>  struct devlink_sb_pool_info {
>> @@ -1414,9 +1437,13 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
>>  				   u16 pf, bool external);
>>  void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
>>  				   u16 pf, u16 vf, bool external);
>> -struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
>> -						 unsigned int linecard_index);
>> +struct devlink_linecard *
>> +devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
>> +			const struct devlink_linecard_ops *ops, void *priv);
>>  void devlink_linecard_destroy(struct devlink_linecard *linecard);
>> +void devlink_linecard_provision_set(struct devlink_linecard *linecard,
>> +				    u32 type_index);
>> +void devlink_linecard_provision_clear(struct devlink_linecard *linecard);
>>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>>  			u32 size, u16 ingress_pools_count,
>>  			u16 egress_pools_count, u16 ingress_tc_count,
>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> index e5ed0522591f..4111ddcc000b 100644
>> --- a/include/uapi/linux/devlink.h
>> +++ b/include/uapi/linux/devlink.h
>> @@ -131,6 +131,9 @@ enum devlink_command {
>>  	DEVLINK_CMD_LINECARD_NEW,
>>  	DEVLINK_CMD_LINECARD_DEL,
>>  
>> +	DEVLINK_CMD_LINECARD_PROVISION,
>> +	DEVLINK_CMD_LINECARD_UNPROVISION,
>
>I do not really see the point in these two commands. Better extend
>DEVLINK_CMD_LINECARD_SET to carry these attributes.

Yeah, I was thinking about that. Not sure it is correct though. This is
single purpose command. It really does not change "an attribute" as the
"_SET" commands are usually doing. Consider extension of "_SET" by other
attributes. Then it looks wrong.


>
>> +
>>  	/* add new commands above here */
>>  	__DEVLINK_CMD_MAX,
>>  	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
>> @@ -329,6 +332,17 @@ enum devlink_reload_limit {
>>  
>>  #define DEVLINK_RELOAD_LIMITS_VALID_MASK (_BITUL(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
>>  
>> +enum devlink_linecard_state {
>> +	DEVLINK_LINECARD_STATE_UNSPEC,
>> +	DEVLINK_LINECARD_STATE_UNPROVISIONED,
>> +	DEVLINK_LINECARD_STATE_UNPROVISIONING,
>> +	DEVLINK_LINECARD_STATE_PROVISIONING,
>
>Can you explain why these two states are necessary? Any reason the
>provision operation can't be synchronous? This is somewhat explained in
>patch #8, but it should really be explained here. Changelog says:
>
>"To avoid deadlock and to mimic actual HW flow, use workqueue
>to add/del ports during provisioning as the port add/del calls
>devlink_port_register/unregister() which take devlink mutex."
>
>The deadlock is not really a reason to have these states.

It is, need to avoid recursice locking

>'DEVLINK_CMD_PORT_SPLIT' also calls devlink_port_register() /
>devlink_port_unregister() and the deadlock is solved by:
>
>'internal_flags = DEVLINK_NL_FLAG_NO_LOCK'

Yeah, however, there, the port_index is passed down to the driver, not
the actual object pointer. That's why it can be done like that.

>
>A hardware flow the requires it is something else...

Hardware flow in case of Spectrum is async too.


>
>> +	DEVLINK_LINECARD_STATE_PROVISIONED,
>> +
>> +	__DEVLINK_LINECARD_STATE_MAX,
>> +	DEVLINK_LINECARD_STATE_MAX = __DEVLINK_LINECARD_STATE_MAX - 1
>> +};
>> +
>>  enum devlink_attr {
>>  	/* don't change the order or add anything between, this is ABI! */
>>  	DEVLINK_ATTR_UNSPEC,
>> @@ -535,6 +549,9 @@ enum devlink_attr {
>>  	DEVLINK_ATTR_RELOAD_ACTION_STATS,       /* nested */
>>  
>>  	DEVLINK_ATTR_LINECARD_INDEX,		/* u32 */
>> +	DEVLINK_ATTR_LINECARD_STATE,		/* u8 */
>> +	DEVLINK_ATTR_LINECARD_TYPE,		/* string */
>> +	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
>>  
>>  	/* add new attributes above here, update the policy in devlink.c */
>>  
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index 564e921133cf..434eecc310c3 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -1192,7 +1192,9 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
>>  				    u32 seq, int flags,
>>  				    struct netlink_ext_ack *extack)
>>  {
>> +	struct nlattr *attr;
>>  	void *hdr;
>> +	int i;
>>  
>>  	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
>>  	if (!hdr)
>> @@ -1202,6 +1204,22 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
>>  		goto nla_put_failure;
>>  	if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index))
>>  		goto nla_put_failure;
>> +	if (nla_put_u8(msg, DEVLINK_ATTR_LINECARD_STATE, linecard->state))
>> +		goto nla_put_failure;
>> +	if (linecard->state >= DEVLINK_LINECARD_STATE_PROVISIONED &&
>
>This assumes that every state added after provisioned should report the
>type. Better to check for the specific states

Yes, that is correct assumption.


>
>> +	    nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE,
>> +			   linecard->provisioned_type))
>> +		goto nla_put_failure;
>> +
>> +	attr = nla_nest_start(msg, DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES);
>> +	if (!attr)
>> +		return -EMSGSIZE;
>> +	for (i = 0; i < linecard->ops->supported_types_count; i++) {
>> +		if (nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE,
>> +				   linecard->ops->supported_types[i]))
>> +			goto nla_put_failure;
>> +	}
>> +	nla_nest_end(msg, attr);
>>  
>>  	genlmsg_end(msg, hdr);
>>  	return 0;
>> @@ -1300,6 +1318,68 @@ static int devlink_nl_cmd_linecard_get_dumpit(struct sk_buff *msg,
>>  	return msg->len;
>>  }
>>  
>> +static int devlink_nl_cmd_linecard_provision_doit(struct sk_buff *skb,
>> +						  struct genl_info *info)
>> +{
>> +	struct devlink_linecard *linecard = info->user_ptr[1];
>> +	const char *type;
>> +	int i;
>> +
>> +	if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being provisioned");
>> +		return -EBUSY;
>> +	}
>> +	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being unprovisioned");
>> +		return -EBUSY;
>> +	}
>> +	if (linecard->state != DEVLINK_LINECARD_STATE_UNPROVISIONED) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard already provisioned");
>> +		return -EBUSY;
>> +	}
>> +
>> +	if (!info->attrs[DEVLINK_ATTR_LINECARD_TYPE]) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "Provision type not provided");
>> +		return -EINVAL;
>> +	}
>> +
>> +	type = nla_data(info->attrs[DEVLINK_ATTR_LINECARD_TYPE]);
>> +	for (i = 0; i < linecard->ops->supported_types_count; i++) {
>> +		if (!strcmp(linecard->ops->supported_types[i], type)) {
>> +			linecard->state = DEVLINK_LINECARD_STATE_PROVISIONING;
>> +			devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
>> +			return linecard->ops->provision(linecard,
>> +							linecard->priv, i,
>> +							info->extack);
>
>So if this fails user space will see 'provisioning' although nothing is
>being provisioned... Better to set the state and notify if this call did
>not fail

The driver is responsible to either call provision_set/provision_clear
helper. Note the async nature of this op.


>
>> +		}
>> +	}
>> +	NL_SET_ERR_MSG_MOD(info->extack, "Unsupported provision type provided");
>> +	return -EINVAL;
>> +}
>> +
>> +static int devlink_nl_cmd_linecard_unprovision_doit(struct sk_buff *skb,
>> +						    struct genl_info *info)
>> +{
>> +	struct devlink_linecard *linecard = info->user_ptr[1];
>> +
>> +	if (linecard->state == DEVLINK_LINECARD_STATE_PROVISIONING) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being provisioned");
>> +		return -EBUSY;
>> +	}
>> +	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONING) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is currently being unprovisioned");
>> +		return -EBUSY;
>> +	}
>> +	if (linecard->state == DEVLINK_LINECARD_STATE_UNPROVISIONED) {
>> +		NL_SET_ERR_MSG_MOD(info->extack, "Linecard is not provisioned");
>> +		return -EOPNOTSUPP;
>> +	}
>> +	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONING;
>> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
>> +	return linecard->ops->unprovision(linecard, linecard->priv,
>> +					  info->extack);
>> +}
>> +
>>  static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink,
>>  			      struct devlink_sb *devlink_sb,
>>  			      enum devlink_command cmd, u32 portid,
>> @@ -7759,6 +7839,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>>  							DEVLINK_RELOAD_ACTION_MAX),
>>  	[DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK),
>>  	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
>> +	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
>>  };
>>  
>>  static const struct genl_small_ops devlink_nl_ops[] = {
>> @@ -7806,6 +7887,20 @@ static const struct genl_small_ops devlink_nl_ops[] = {
>>  		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
>>  		/* can be retrieved by unprivileged users */
>>  	},
>> +	{
>> +		.cmd = DEVLINK_CMD_LINECARD_PROVISION,
>> +		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> +		.doit = devlink_nl_cmd_linecard_provision_doit,
>> +		.flags = GENL_ADMIN_PERM,
>> +		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
>> +	},
>> +	{
>> +		.cmd = DEVLINK_CMD_LINECARD_UNPROVISION,
>> +		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> +		.doit = devlink_nl_cmd_linecard_unprovision_doit,
>> +		.flags = GENL_ADMIN_PERM,
>> +		.internal_flags = DEVLINK_NL_FLAG_NEED_LINECARD,
>> +	},
>>  	{
>>  		.cmd = DEVLINK_CMD_SB_GET,
>>  		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> @@ -8613,11 +8708,17 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>>   *	Create devlink linecard instance with provided linecard index.
>>   *	Caller can use any indexing, even hw-related one.
>>   */
>> -struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
>> -						 unsigned int linecard_index)
>> +struct devlink_linecard *
>> +devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
>> +			const struct devlink_linecard_ops *ops, void *priv)
>>  {
>>  	struct devlink_linecard *linecard;
>>  
>> +	if (WARN_ON(!ops || !ops->supported_types ||
>> +		    !ops->supported_types_count ||
>> +		    !ops->provision || !ops->unprovision))
>> +		return ERR_PTR(-EINVAL);
>> +
>>  	mutex_lock(&devlink->lock);
>>  	if (devlink_linecard_index_exists(devlink, linecard_index)) {
>>  		mutex_unlock(&devlink->lock);
>> @@ -8630,6 +8731,9 @@ struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
>>  
>>  	linecard->devlink = devlink;
>>  	linecard->index = linecard_index;
>> +	linecard->ops = ops;
>> +	linecard->priv = priv;
>> +	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED;
>>  	list_add_tail(&linecard->list, &devlink->linecard_list);
>>  	mutex_unlock(&devlink->lock);
>>  	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
>> @@ -8653,6 +8757,39 @@ void devlink_linecard_destroy(struct devlink_linecard *linecard)
>>  }
>>  EXPORT_SYMBOL_GPL(devlink_linecard_create);
>>  
>> +/**
>> + *	devlink_linecard_provision_set - Set provisioning on linecard
>
>'Set linecard as provisioned' maybe?

Sure, why not.


>
>> + *
>> + *	@devlink_linecard: devlink linecard
>> + *	@type_index: index of the linecard type (in array of types in ops)
>> + */
>> +void devlink_linecard_provision_set(struct devlink_linecard *linecard,
>> +				    u32 type_index)
>> +{
>> +	WARN_ON(type_index >= linecard->ops->supported_types_count);
>
>Wouldn't this explode below when you use the index to access the array?
>Maybe better to just warn and return

Okay.


>
>> +	mutex_lock(&linecard->devlink->lock);
>> +	linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED;
>> +	linecard->provisioned_type = linecard->ops->supported_types[type_index];
>> +	mutex_unlock(&linecard->devlink->lock);
>> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_linecard_provision_set);
>> +
>> +/**
>> + *	devlink_linecard_provision_clear - Clear provisioning on linecard
>
>'Set linecard as unprovisioned' maybe?

Sure, why not.


>
>> + *
>> + *	@devlink_linecard: devlink linecard
>> + */
>> +void devlink_linecard_provision_clear(struct devlink_linecard *linecard)
>> +{
>> +	mutex_lock(&linecard->devlink->lock);
>> +	linecard->state = DEVLINK_LINECARD_STATE_UNPROVISIONED;
>> +	linecard->provisioned_type = NULL;
>> +	mutex_unlock(&linecard->devlink->lock);
>> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_linecard_provision_clear);
>> +
>>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>>  			u32 size, u16 ingress_pools_count,
>>  			u16 egress_pools_count, u16 ingress_tc_count,
>> -- 
>> 2.26.2
>> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 03/10] devlink: implement line card active state
  2021-01-15 16:06   ` Ido Schimmel
@ 2021-01-15 16:52     ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-15 16:52 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 05:06:08PM CET, idosch@idosch.org wrote:
>On Wed, Jan 13, 2021 at 01:12:15PM +0100, Jiri Pirko wrote:
>> +/**
>> + *	devlink_linecard_deactivate - Set linecard deactive
>
>Set linecard as inactive

Okay.

>
>> + *
>> + *	@devlink_linecard: devlink linecard
>> + */
>> +void devlink_linecard_deactivate(struct devlink_linecard *linecard)
>> +{
>> +	mutex_lock(&linecard->devlink->lock);
>> +	WARN_ON(linecard->state != DEVLINK_LINECARD_STATE_ACTIVE);
>> +	linecard->state = DEVLINK_LINECARD_STATE_PROVISIONED;
>> +	devlink_linecard_notify(linecard, DEVLINK_CMD_LINECARD_NEW);
>> +	mutex_unlock(&linecard->devlink->lock);
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_linecard_deactivate);
>> +
>> +/**
>> + *	devlink_linecard_is_active - Check if active
>> + *
>> + *	@devlink_linecard: devlink linecard
>> + */
>> +bool devlink_linecard_is_active(struct devlink_linecard *linecard)
>> +{
>> +	bool active;
>> +
>> +	mutex_lock(&linecard->devlink->lock);
>> +	active = linecard->state == DEVLINK_LINECARD_STATE_ACTIVE;
>> +	mutex_unlock(&linecard->devlink->lock);
>> +	return active;
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_linecard_is_active);
>> +
>>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>>  			u32 size, u16 ingress_pools_count,
>>  			u16 egress_pools_count, u16 ingress_tc_count,
>> -- 
>> 2.26.2
>> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 05/10] devlink: add port to line card relationship set
  2021-01-15 16:10   ` Ido Schimmel
@ 2021-01-15 16:53     ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-15 16:53 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 05:10:48PM CET, idosch@idosch.org wrote:
>On Wed, Jan 13, 2021 at 01:12:17PM +0100, Jiri Pirko wrote:
>> index ec00cd94c626..cb911b6fdeda 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -137,6 +137,7 @@ struct devlink_port {
>>  	struct delayed_work type_warn_dw;
>>  	struct list_head reporter_list;
>>  	struct mutex reporters_lock; /* Protects reporter_list */
>> +	struct devlink_linecard *linecard;
>>  };
>>  
>>  struct devlink_linecard_ops;
>> @@ -1438,6 +1439,8 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
>>  				   u16 pf, bool external);
>>  void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
>>  				   u16 pf, u16 vf, bool external);
>> +void devlink_port_linecard_set(struct devlink_port *devlink_port,
>> +			       struct devlink_linecard *linecard);
>>  struct devlink_linecard *
>>  devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
>>  			const struct devlink_linecard_ops *ops, void *priv);
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index 347976b88404..2faa30cc5cce 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -855,6 +855,10 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
>>  		goto nla_put_failure;
>>  	if (devlink_nl_port_function_attrs_put(msg, devlink_port, extack))
>>  		goto nla_put_failure;
>> +	if (devlink_port->linecard &&
>> +	    nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX,
>> +			devlink_port->linecard->index))
>> +		goto nla_put_failure;
>>  
>>  	genlmsg_end(msg, hdr);
>>  	return 0;
>> @@ -8642,6 +8646,21 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
>>  }
>>  EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
>>  
>> +/**
>> + *	devlink_port_linecard_set - Link port with a linecard
>> + *
>> + *	@devlink_port: devlink port
>> + *	@devlink_linecard: devlink linecard
>> + */
>> +void devlink_port_linecard_set(struct devlink_port *devlink_port,
>> +			       struct devlink_linecard *linecard)
>> +{
>> +	if (WARN_ON(devlink_port->registered))
>> +		return;
>> +	devlink_port->linecard = linecard;
>
>We already have devlink_port_attrs_set() that is called before the port
>is registered, why not extend it to also set the linecard information?

I was thinking about that. Looked odd to put the linecard pointer to the
attr struct. I like it better this way. But if you insist.


>
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_port_linecard_set);
>> +
>>  static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>>  					     char *name, size_t len)
>>  {
>> @@ -8654,7 +8673,11 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>>  	switch (attrs->flavour) {
>>  	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
>>  	case DEVLINK_PORT_FLAVOUR_VIRTUAL:
>> -		n = snprintf(name, len, "p%u", attrs->phys.port_number);
>> +		if (devlink_port->linecard)
>> +			n = snprintf(name, len, "l%u",
>> +				     devlink_port->linecard->index);
>> +		n += snprintf(name + n, len - n, "p%u",
>> +			      attrs->phys.port_number);
>>  		if (attrs->split)
>>  			n += snprintf(name + n, len - n, "s%u",
>>  				      attrs->phys.split_subport_number);
>> -- 
>> 2.26.2
>> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 08/10] netdevsim: create devlink line card object and implement provisioning
  2021-01-15 16:30   ` Ido Schimmel
@ 2021-01-15 16:54     ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-15 16:54 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 05:30:58PM CET, idosch@idosch.org wrote:
>On Wed, Jan 13, 2021 at 01:12:20PM +0100, Jiri Pirko wrote:
>> @@ -977,6 +1012,9 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev,
>>  	memcpy(attrs.switch_id.id, nsim_dev->switch_id.id, nsim_dev->switch_id.id_len);
>>  	attrs.switch_id.id_len = nsim_dev->switch_id.id_len;
>>  	devlink_port_attrs_set(devlink_port, &attrs);
>> +	if (nsim_dev_linecard)
>> +		devlink_port_linecard_set(devlink_port,
>> +					  nsim_dev_linecard->devlink_linecard);
>
>Should be fold into devlink_port_attrs_set()
>
>>  	err = devlink_port_register(priv_to_devlink(nsim_dev), devlink_port,
>>  				    nsim_dev_port->port_index);
>>  	if (err)
>> @@ -1053,10 +1091,88 @@ static int nsim_dev_port_add_all(struct nsim_dev *nsim_dev,
>>  	return err;
>>  }
>>  
>> +static void nsim_dev_linecard_provision_work(struct work_struct *work)
>> +{
>> +	struct nsim_dev_linecard *nsim_dev_linecard;
>> +	struct nsim_bus_dev *nsim_bus_dev;
>> +	int err;
>> +	int i;
>> +
>> +	nsim_dev_linecard = container_of(work, struct nsim_dev_linecard,
>> +					 provision_work);
>> +
>> +	nsim_bus_dev = nsim_dev_linecard->nsim_dev->nsim_bus_dev;
>> +	for (i = 0; i < nsim_dev_linecard_port_count(nsim_dev_linecard); i++) {
>> +		err = nsim_dev_port_add(nsim_bus_dev, nsim_dev_linecard, i);
>> +		if (err)
>> +			goto err_port_del_all;
>> +	}
>> +	nsim_dev_linecard->provisioned = true;
>> +	devlink_linecard_provision_set(nsim_dev_linecard->devlink_linecard,
>> +				       nsim_dev_linecard->type_index);
>> +	return;
>> +
>> +err_port_del_all:
>> +	for (i--; i >= 0; i--)
>> +		nsim_dev_port_del(nsim_bus_dev, nsim_dev_linecard, i);
>> +	devlink_linecard_provision_clear(nsim_dev_linecard->devlink_linecard);
>> +}
>> +
>> +static int nsim_dev_linecard_provision(struct devlink_linecard *linecard,
>> +				       void *priv, u32 type_index,
>> +				       struct netlink_ext_ack *extack)
>> +{
>> +	struct nsim_dev_linecard *nsim_dev_linecard = priv;
>> +
>> +	nsim_dev_linecard->type_index = type_index;
>> +	INIT_WORK(&nsim_dev_linecard->provision_work,
>> +		  nsim_dev_linecard_provision_work);
>> +	schedule_work(&nsim_dev_linecard->provision_work);
>> +
>> +	return 0;
>> +}
>> +
>> +static void nsim_dev_linecard_unprovision_work(struct work_struct *work)
>> +{
>> +	struct nsim_dev_linecard *nsim_dev_linecard;
>> +	struct nsim_bus_dev *nsim_bus_dev;
>> +	int i;
>> +
>> +	nsim_dev_linecard = container_of(work, struct nsim_dev_linecard,
>> +					 provision_work);
>> +
>> +	nsim_bus_dev = nsim_dev_linecard->nsim_dev->nsim_bus_dev;
>> +	nsim_dev_linecard->provisioned = false;
>> +	devlink_linecard_provision_clear(nsim_dev_linecard->devlink_linecard);
>> +	for (i = 0; i < nsim_dev_linecard_port_count(nsim_dev_linecard); i++)
>> +		nsim_dev_port_del(nsim_bus_dev, nsim_dev_linecard, i);
>> +}
>> +
>> +static int nsim_dev_linecard_unprovision(struct devlink_linecard *linecard,
>> +					 void *priv,
>> +					 struct netlink_ext_ack *extack)
>> +{
>> +	struct nsim_dev_linecard *nsim_dev_linecard = priv;
>> +
>> +	INIT_WORK(&nsim_dev_linecard->provision_work,
>> +		  nsim_dev_linecard_unprovision_work);
>> +	schedule_work(&nsim_dev_linecard->provision_work);
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct devlink_linecard_ops nsim_dev_linecard_ops = {
>> +	.supported_types = nsim_dev_linecard_supported_types,
>> +	.supported_types_count = ARRAY_SIZE(nsim_dev_linecard_supported_types),
>> +	.provision = nsim_dev_linecard_provision,
>> +	.unprovision = nsim_dev_linecard_unprovision,
>> +};
>> +
>>  static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
>>  				   unsigned int linecard_index)
>>  {
>>  	struct nsim_dev_linecard *nsim_dev_linecard;
>> +	struct devlink_linecard *devlink_linecard;
>>  	int err;
>>  
>>  	nsim_dev_linecard = kzalloc(sizeof(*nsim_dev_linecard), GFP_KERNEL);
>> @@ -1066,14 +1182,27 @@ static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
>>  	nsim_dev_linecard->linecard_index = linecard_index;
>>  	INIT_LIST_HEAD(&nsim_dev_linecard->port_list);
>>  
>> +	devlink_linecard = devlink_linecard_create(priv_to_devlink(nsim_dev),
>> +						   linecard_index,
>> +						   &nsim_dev_linecard_ops,
>> +						   nsim_dev_linecard);
>> +	if (IS_ERR(devlink_linecard)) {
>> +		err = PTR_ERR(devlink_linecard);
>> +		goto err_linecard_free;
>> +	}
>> +
>> +	nsim_dev_linecard->devlink_linecard = devlink_linecard;
>> +
>>  	err = nsim_dev_linecard_debugfs_init(nsim_dev, nsim_dev_linecard);
>>  	if (err)
>> -		goto err_linecard_free;
>> +		goto err_dl_linecard_destroy;
>>  
>>  	list_add(&nsim_dev_linecard->list, &nsim_dev->linecard_list);
>>  
>>  	return 0;
>>  
>> +err_dl_linecard_destroy:
>> +	devlink_linecard_destroy(devlink_linecard);
>>  err_linecard_free:
>>  	kfree(nsim_dev_linecard);
>>  	return err;
>> @@ -1081,8 +1210,12 @@ static int __nsim_dev_linecard_add(struct nsim_dev *nsim_dev,
>>  
>>  static void __nsim_dev_linecard_del(struct nsim_dev_linecard *nsim_dev_linecard)
>>  {
>> +	struct devlink_linecard *devlink_linecard =
>> +					nsim_dev_linecard->devlink_linecard;
>> +
>>  	list_del(&nsim_dev_linecard->list);
>>  	nsim_dev_linecard_debugfs_exit(nsim_dev_linecard);
>
>What about the delayed work? I believe it can run while you are
>destroying the linecard, so cancel_delayed_work_sync() is needed

Sure, I missed that.


>
>> +	devlink_linecard_destroy(devlink_linecard);
>>  	kfree(nsim_dev_linecard);
>>  }
>>  
>> diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
>> index 88b61b9390bf..ab217b361416 100644
>> --- a/drivers/net/netdevsim/netdevsim.h
>> +++ b/drivers/net/netdevsim/netdevsim.h
>> @@ -196,10 +196,14 @@ struct nsim_dev;
>>  
>>  struct nsim_dev_linecard {
>>  	struct list_head list;
>> +	struct devlink_linecard *devlink_linecard;
>>  	struct nsim_dev *nsim_dev;
>>  	struct list_head port_list;
>>  	unsigned int linecard_index;
>>  	struct dentry *ddir;
>> +	bool provisioned;
>> +	u32 type_index;
>> +	struct work_struct provision_work;
>>  };
>>  
>>  struct nsim_dev {
>> -- 
>> 2.26.2
>> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-15 15:43 ` Ido Schimmel
@ 2021-01-15 16:55   ` Jiri Pirko
  2021-01-15 18:01     ` Ido Schimmel
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-15 16:55 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 04:43:57PM CET, idosch@idosch.org wrote:
>On Wed, Jan 13, 2021 at 01:12:12PM +0100, Jiri Pirko wrote:
>> # Create a new netdevsim device, with no ports and 2 line cards:
>> $ echo "10 0 2" >/sys/bus/netdevsim/new_device
>> $ devlink port # No ports are listed
>> $ devlink lc
>> netdevsim/netdevsim10:
>>   lc 0 state unprovisioned
>>     supported_types:
>>        card1port card2ports card4ports
>>   lc 1 state unprovisioned
>>     supported_types:
>>        card1port card2ports card4ports
>> 
>> # Note that driver advertizes supported line card types. In case of
>> # netdevsim, these are 3.
>> 
>> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
>
>Why do we need a separate command for that? You actually introduced
>'DEVLINK_CMD_LINECARD_SET' in patch #1, but it's never used.
>
>I prefer:
>
>devlink lc set netdevsim/netdevsim10 index 0 state provision type card4ports

This is misleading. This is actually not setting state. The state gets
changed upon successful provisioning process. Also, one may think that
he can set other states, but he can't. I don't like this at all :/


>devlink lc set netdevsim/netdevsim10 index 0 state unprovision
>
>It is consistent with the GET/SET/NEW/DEL pattern used by other
>commands.

Not really, see split port for example. This is similar to that.

>
>> $ devlink lc
>> netdevsim/netdevsim10:
>>   lc 0 state provisioned type card4ports
>>     supported_types:
>>        card1port card2ports card4ports
>>   lc 1 state unprovisioned
>>     supported_types:
>>        card1port card2ports card4ports
>> $ devlink port
>> netdevsim/netdevsim10/1000: type eth netdev eni10nl0p1 flavour physical lc 0 port 1 splittable false
>> netdevsim/netdevsim10/1001: type eth netdev eni10nl0p2 flavour physical lc 0 port 2 splittable false
>> netdevsim/netdevsim10/1002: type eth netdev eni10nl0p3 flavour physical lc 0 port 3 splittable false
>> netdevsim/netdevsim10/1003: type eth netdev eni10nl0p4 flavour physical lc 0 port 4 splittable false
>> #                                                 ^^                    ^^^^
>> #                                     netdev name adjusted          index of a line card this port belongs to
>> 
>> $ ip link set eni10nl0p1 up 
>> $ ip link show eni10nl0p1   
>> 165: eni10nl0p1: <NO-CARRIER,BROADCAST,NOARP,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
>>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>> 
>> # Now activate the line card using debugfs. That emulates plug-in event
>> # on real hardware:
>> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>> $ ip link show eni10nl0p1
>> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>> # The carrier is UP now.
>> 
>> Jiri Pirko (10):
>>   devlink: add support to create line card and expose to user
>>   devlink: implement line card provisioning
>>   devlink: implement line card active state
>>   devlink: append split port number to the port name
>>   devlink: add port to line card relationship set
>>   netdevsim: introduce line card support
>>   netdevsim: allow port objects to be linked with line cards
>>   netdevsim: create devlink line card object and implement provisioning
>>   netdevsim: implement line card activation
>>   selftests: add netdevsim devlink lc test
>> 
>>  drivers/net/netdevsim/bus.c                   |  21 +-
>>  drivers/net/netdevsim/dev.c                   | 370 ++++++++++++++-
>>  drivers/net/netdevsim/netdev.c                |   2 +
>>  drivers/net/netdevsim/netdevsim.h             |  23 +
>>  include/net/devlink.h                         |  44 ++
>>  include/uapi/linux/devlink.h                  |  25 +
>>  net/core/devlink.c                            | 443 +++++++++++++++++-
>>  .../drivers/net/netdevsim/devlink.sh          |  62 ++-
>>  8 files changed, 964 insertions(+), 26 deletions(-)
>> 
>> -- 
>> 2.26.2
>> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-15 16:55   ` Jiri Pirko
@ 2021-01-15 18:01     ` Ido Schimmel
  2021-01-18 13:03       ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Ido Schimmel @ 2021-01-15 18:01 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Fri, Jan 15, 2021 at 05:55:59PM +0100, Jiri Pirko wrote:
> Fri, Jan 15, 2021 at 04:43:57PM CET, idosch@idosch.org wrote:
> >On Wed, Jan 13, 2021 at 01:12:12PM +0100, Jiri Pirko wrote:
> >> # Create a new netdevsim device, with no ports and 2 line cards:
> >> $ echo "10 0 2" >/sys/bus/netdevsim/new_device
> >> $ devlink port # No ports are listed
> >> $ devlink lc
> >> netdevsim/netdevsim10:
> >>   lc 0 state unprovisioned
> >>     supported_types:
> >>        card1port card2ports card4ports
> >>   lc 1 state unprovisioned
> >>     supported_types:
> >>        card1port card2ports card4ports
> >> 
> >> # Note that driver advertizes supported line card types. In case of
> >> # netdevsim, these are 3.
> >> 
> >> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
> >
> >Why do we need a separate command for that? You actually introduced
> >'DEVLINK_CMD_LINECARD_SET' in patch #1, but it's never used.
> >
> >I prefer:
> >
> >devlink lc set netdevsim/netdevsim10 index 0 state provision type card4ports
> 
> This is misleading. This is actually not setting state. The state gets
> changed upon successful provisioning process. Also, one may think that
> he can set other states, but he can't. I don't like this at all :/

So make state a read-only attribute. You really only care about setting
the type.

To provision:

# devlink lc set netdevsim/netdevsim10 index 0 type card4ports

To unprovsion:

# devlink lc set netdevsim/netdevsim10 index 0 type none

Or:

# devlink lc set netdevsim/netdevsim10 index 0 notype

> 
> 
> >devlink lc set netdevsim/netdevsim10 index 0 state unprovision
> >
> >It is consistent with the GET/SET/NEW/DEL pattern used by other
> >commands.
> 
> Not really, see split port for example. This is similar to that.

It's not. The split command creates new objects whereas this command
modifies an existing object.

> 
> >
> >> $ devlink lc
> >> netdevsim/netdevsim10:
> >>   lc 0 state provisioned type card4ports
> >>     supported_types:
> >>        card1port card2ports card4ports
> >>   lc 1 state unprovisioned
> >>     supported_types:
> >>        card1port card2ports card4ports
> >> $ devlink port
> >> netdevsim/netdevsim10/1000: type eth netdev eni10nl0p1 flavour physical lc 0 port 1 splittable false
> >> netdevsim/netdevsim10/1001: type eth netdev eni10nl0p2 flavour physical lc 0 port 2 splittable false
> >> netdevsim/netdevsim10/1002: type eth netdev eni10nl0p3 flavour physical lc 0 port 3 splittable false
> >> netdevsim/netdevsim10/1003: type eth netdev eni10nl0p4 flavour physical lc 0 port 4 splittable false
> >> #                                                 ^^                    ^^^^
> >> #                                     netdev name adjusted          index of a line card this port belongs to
> >> 
> >> $ ip link set eni10nl0p1 up 
> >> $ ip link show eni10nl0p1   
> >> 165: eni10nl0p1: <NO-CARRIER,BROADCAST,NOARP,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
> >>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
> >> 
> >> # Now activate the line card using debugfs. That emulates plug-in event
> >> # on real hardware:
> >> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
> >> $ ip link show eni10nl0p1
> >> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> >>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
> >> # The carrier is UP now.
> >> 
> >> Jiri Pirko (10):
> >>   devlink: add support to create line card and expose to user
> >>   devlink: implement line card provisioning
> >>   devlink: implement line card active state
> >>   devlink: append split port number to the port name
> >>   devlink: add port to line card relationship set
> >>   netdevsim: introduce line card support
> >>   netdevsim: allow port objects to be linked with line cards
> >>   netdevsim: create devlink line card object and implement provisioning
> >>   netdevsim: implement line card activation
> >>   selftests: add netdevsim devlink lc test
> >> 
> >>  drivers/net/netdevsim/bus.c                   |  21 +-
> >>  drivers/net/netdevsim/dev.c                   | 370 ++++++++++++++-
> >>  drivers/net/netdevsim/netdev.c                |   2 +
> >>  drivers/net/netdevsim/netdevsim.h             |  23 +
> >>  include/net/devlink.h                         |  44 ++
> >>  include/uapi/linux/devlink.h                  |  25 +
> >>  net/core/devlink.c                            | 443 +++++++++++++++++-
> >>  .../drivers/net/netdevsim/devlink.sh          |  62 ++-
> >>  8 files changed, 964 insertions(+), 26 deletions(-)
> >> 
> >> -- 
> >> 2.26.2
> >> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 02/10] devlink: implement line card provisioning
  2021-01-15 16:51     ` Jiri Pirko
@ 2021-01-15 18:09       ` Ido Schimmel
  2021-01-18 12:50         ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Ido Schimmel @ 2021-01-15 18:09 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Fri, Jan 15, 2021 at 05:51:57PM +0100, Jiri Pirko wrote:
> Fri, Jan 15, 2021 at 05:03:19PM CET, idosch@idosch.org wrote:
> >On Wed, Jan 13, 2021 at 01:12:14PM +0100, Jiri Pirko wrote:
> >> From: Jiri Pirko <jiri@nvidia.com>
> >> 
> >> In order to be able to configure all needed stuff on a port/netdevice
> >> of a line card without the line card being present, introduce line card
> >> provisioning. Basically provisioning will create a placeholder for
> >> instances (ports/netdevices) for a line card type.
> >> 
> >> Allow the user to query the supported line card types over line card
> >> get command. Then implement two netlink commands to allow user to
> >> provision/unprovision the line card with selected line card type.
> >> 
> >> On the driver API side, add provision/unprovision ops and supported
> >> types array to be advertised. Upon provision op call, the driver should
> >> take care of creating the instances for the particular line card type.
> >> Introduce provision_set/clear() functions to be called by the driver
> >> once the provisioning/unprovisioning is done on its side.
> >> 
> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
> >> ---
> >>  include/net/devlink.h        |  31 +++++++-
> >>  include/uapi/linux/devlink.h |  17 +++++
> >>  net/core/devlink.c           | 141 ++++++++++++++++++++++++++++++++++-
> >>  3 files changed, 185 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/include/net/devlink.h b/include/net/devlink.h
> >> index 67c2547d5ef9..854abd53e9ea 100644
> >> --- a/include/net/devlink.h
> >> +++ b/include/net/devlink.h
> >> @@ -139,10 +139,33 @@ struct devlink_port {
> >>  	struct mutex reporters_lock; /* Protects reporter_list */
> >>  };
> >>  
> >> +struct devlink_linecard_ops;
> >> +
> >>  struct devlink_linecard {
> >>  	struct list_head list;
> >>  	struct devlink *devlink;
> >>  	unsigned int index;
> >> +	const struct devlink_linecard_ops *ops;
> >> +	void *priv;
> >> +	enum devlink_linecard_state state;
> >> +	const char *provisioned_type;
> >> +};
> >> +
> >> +/**
> >> + * struct devlink_linecard_ops - Linecard operations
> >> + * @supported_types: array of supported types of linecards
> >> + * @supported_types_count: number of elements in the array above
> >> + * @provision: callback to provision the linecard slot with certain
> >> + *	       type of linecard
> >> + * @unprovision: callback to unprovision the linecard slot
> >> + */
> >> +struct devlink_linecard_ops {
> >> +	const char **supported_types;
> >> +	unsigned int supported_types_count;
> >> +	int (*provision)(struct devlink_linecard *linecard, void *priv,
> >> +			 u32 type_index, struct netlink_ext_ack *extack);
> >> +	int (*unprovision)(struct devlink_linecard *linecard, void *priv,
> >> +			   struct netlink_ext_ack *extack);
> >>  };
> >>  
> >>  struct devlink_sb_pool_info {
> >> @@ -1414,9 +1437,13 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
> >>  				   u16 pf, bool external);
> >>  void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
> >>  				   u16 pf, u16 vf, bool external);
> >> -struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
> >> -						 unsigned int linecard_index);
> >> +struct devlink_linecard *
> >> +devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
> >> +			const struct devlink_linecard_ops *ops, void *priv);
> >>  void devlink_linecard_destroy(struct devlink_linecard *linecard);
> >> +void devlink_linecard_provision_set(struct devlink_linecard *linecard,
> >> +				    u32 type_index);
> >> +void devlink_linecard_provision_clear(struct devlink_linecard *linecard);
> >>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
> >>  			u32 size, u16 ingress_pools_count,
> >>  			u16 egress_pools_count, u16 ingress_tc_count,
> >> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> >> index e5ed0522591f..4111ddcc000b 100644
> >> --- a/include/uapi/linux/devlink.h
> >> +++ b/include/uapi/linux/devlink.h
> >> @@ -131,6 +131,9 @@ enum devlink_command {
> >>  	DEVLINK_CMD_LINECARD_NEW,
> >>  	DEVLINK_CMD_LINECARD_DEL,
> >>  
> >> +	DEVLINK_CMD_LINECARD_PROVISION,
> >> +	DEVLINK_CMD_LINECARD_UNPROVISION,
> >
> >I do not really see the point in these two commands. Better extend
> >DEVLINK_CMD_LINECARD_SET to carry these attributes.
> 
> Yeah, I was thinking about that. Not sure it is correct though. This is
> single purpose command. It really does not change "an attribute" as the
> "_SET" commands are usually doing. Consider extension of "_SET" by other
> attributes. Then it looks wrong.

It is setting the type of the linecard, which is an attribute of the
linecard.

> 
> 
> >
> >> +
> >>  	/* add new commands above here */
> >>  	__DEVLINK_CMD_MAX,
> >>  	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
> >> @@ -329,6 +332,17 @@ enum devlink_reload_limit {
> >>  
> >>  #define DEVLINK_RELOAD_LIMITS_VALID_MASK (_BITUL(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
> >>  
> >> +enum devlink_linecard_state {
> >> +	DEVLINK_LINECARD_STATE_UNSPEC,
> >> +	DEVLINK_LINECARD_STATE_UNPROVISIONED,
> >> +	DEVLINK_LINECARD_STATE_UNPROVISIONING,
> >> +	DEVLINK_LINECARD_STATE_PROVISIONING,
> >
> >Can you explain why these two states are necessary? Any reason the
> >provision operation can't be synchronous? This is somewhat explained in
> >patch #8, but it should really be explained here. Changelog says:
> >
> >"To avoid deadlock and to mimic actual HW flow, use workqueue
> >to add/del ports during provisioning as the port add/del calls
> >devlink_port_register/unregister() which take devlink mutex."
> >
> >The deadlock is not really a reason to have these states.
> 
> It is, need to avoid recursice locking
> 
> >'DEVLINK_CMD_PORT_SPLIT' also calls devlink_port_register() /
> >devlink_port_unregister() and the deadlock is solved by:
> >
> >'internal_flags = DEVLINK_NL_FLAG_NO_LOCK'
> 
> Yeah, however, there, the port_index is passed down to the driver, not
> the actual object pointer. That's why it can be done like that.
> 
> >
> >A hardware flow the requires it is something else...
> 
> Hardware flow in case of Spectrum is async too.

OK, so the changelog needs to state that these states are necessary
because the nature of linecard provisioning is asynchronous.

> 
> 
> >
> >> +	DEVLINK_LINECARD_STATE_PROVISIONED,
> >> +
> >> +	__DEVLINK_LINECARD_STATE_MAX,
> >> +	DEVLINK_LINECARD_STATE_MAX = __DEVLINK_LINECARD_STATE_MAX - 1
> >> +};
> >> +
> >>  enum devlink_attr {
> >>  	/* don't change the order or add anything between, this is ABI! */
> >>  	DEVLINK_ATTR_UNSPEC,
> >> @@ -535,6 +549,9 @@ enum devlink_attr {
> >>  	DEVLINK_ATTR_RELOAD_ACTION_STATS,       /* nested */
> >>  
> >>  	DEVLINK_ATTR_LINECARD_INDEX,		/* u32 */
> >> +	DEVLINK_ATTR_LINECARD_STATE,		/* u8 */
> >> +	DEVLINK_ATTR_LINECARD_TYPE,		/* string */
> >> +	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
> >>  
> >>  	/* add new attributes above here, update the policy in devlink.c */
> >>  
> >> diff --git a/net/core/devlink.c b/net/core/devlink.c
> >> index 564e921133cf..434eecc310c3 100644
> >> --- a/net/core/devlink.c
> >> +++ b/net/core/devlink.c
> >> @@ -1192,7 +1192,9 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
> >>  				    u32 seq, int flags,
> >>  				    struct netlink_ext_ack *extack)
> >>  {
> >> +	struct nlattr *attr;
> >>  	void *hdr;
> >> +	int i;
> >>  
> >>  	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
> >>  	if (!hdr)
> >> @@ -1202,6 +1204,22 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
> >>  		goto nla_put_failure;
> >>  	if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index))
> >>  		goto nla_put_failure;
> >> +	if (nla_put_u8(msg, DEVLINK_ATTR_LINECARD_STATE, linecard->state))
> >> +		goto nla_put_failure;
> >> +	if (linecard->state >= DEVLINK_LINECARD_STATE_PROVISIONED &&
> >
> >This assumes that every state added after provisioned should report the
> >type. Better to check for the specific states
> 
> Yes, that is correct assumption.

It is correct now, but what if tomorrow someone adds a new state? It
can't be added before the provisioned state because it will break uapi.

> 
> 
> >
> >> +	    nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE,
> >> +			   linecard->provisioned_type))
> >> +		goto nla_put_failure;

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-15 14:39       ` Jiri Pirko
@ 2021-01-15 19:26         ` Jakub Kicinski
  2021-01-18 13:00           ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Jakub Kicinski @ 2021-01-15 19:26 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

On Fri, 15 Jan 2021 15:39:06 +0100 Jiri Pirko wrote:
> >I'm not a SFP experts so maybe someone will correct me but AFAIU
> >the QSFP (for optics) is the same regardless of breakout. It's the
> >passive optical strands that are either bundled or not. So there is 
> >no way for the system to detect the cable type (AFAIK).  
> 
> For SFP module, you are able to detect those.

Not sure you understand what I'm saying. Maybe you're thinking about
DACs? This is a optical cable for breakout:

https://www.fs.com/products/68048.html

There is no electronics in it to "detect" things AFAIU. Same QSFP can
be used with this cable or a non-breakout.

> >Or to put it differently IMO the netdev should be provisioned if the
> >system has a port into which user can plug in a cable. When there is   
> 
> Not really. For slit cables, the ports are provisioned not matter which
> cable is connected, slitter 1->2/1->4 or 1->1 cable.
> 
> 
> >a line card-sized hole in the chassis, I'd be surprised to see ports.
> >
> >That said I never worked with real world routers so maybe that's what
> >they do. Maybe some with a Cisco router in the basement can tell us? :)  
> 
> The need for provision/pre-configure splitter/linecard is that the
> ports/netdevices do not disapper/reappear when you replace
> splitter/linecard. Consider a faulty linecard with one port burned. You
> just want to replace it with new one. And in that case, you really don't
> want kernel to remove netdevices and possibly mess up routing for
> example.

Having a single burned port sounds like a relatively rare scenario.
Reconfiguring routing is not the end of the world.

> >If the device really needs this configuration / can't detect things
> >automatically, then we gotta do something like what you have.
> >The only question is do we still want to call it a line card.
> >Sounds more like a front panel module. At Netronome we called 
> >those phymods.  
> 
> Sure, the name is up to the discussion. We call it "linecard"
> internally. I don't care about the name.

Yeah, let's call it something more appropriate to indicate its
breakout/retimer/gearbox nature, and we'll be good :)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 02/10] devlink: implement line card provisioning
  2021-01-15 18:09       ` Ido Schimmel
@ 2021-01-18 12:50         ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-18 12:50 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 07:09:44PM CET, idosch@idosch.org wrote:
>On Fri, Jan 15, 2021 at 05:51:57PM +0100, Jiri Pirko wrote:
>> Fri, Jan 15, 2021 at 05:03:19PM CET, idosch@idosch.org wrote:
>> >On Wed, Jan 13, 2021 at 01:12:14PM +0100, Jiri Pirko wrote:
>> >> From: Jiri Pirko <jiri@nvidia.com>
>> >> 
>> >> In order to be able to configure all needed stuff on a port/netdevice
>> >> of a line card without the line card being present, introduce line card
>> >> provisioning. Basically provisioning will create a placeholder for
>> >> instances (ports/netdevices) for a line card type.
>> >> 
>> >> Allow the user to query the supported line card types over line card
>> >> get command. Then implement two netlink commands to allow user to
>> >> provision/unprovision the line card with selected line card type.
>> >> 
>> >> On the driver API side, add provision/unprovision ops and supported
>> >> types array to be advertised. Upon provision op call, the driver should
>> >> take care of creating the instances for the particular line card type.
>> >> Introduce provision_set/clear() functions to be called by the driver
>> >> once the provisioning/unprovisioning is done on its side.
>> >> 
>> >> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
>> >> ---
>> >>  include/net/devlink.h        |  31 +++++++-
>> >>  include/uapi/linux/devlink.h |  17 +++++
>> >>  net/core/devlink.c           | 141 ++++++++++++++++++++++++++++++++++-
>> >>  3 files changed, 185 insertions(+), 4 deletions(-)
>> >> 
>> >> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> >> index 67c2547d5ef9..854abd53e9ea 100644
>> >> --- a/include/net/devlink.h
>> >> +++ b/include/net/devlink.h
>> >> @@ -139,10 +139,33 @@ struct devlink_port {
>> >>  	struct mutex reporters_lock; /* Protects reporter_list */
>> >>  };
>> >>  
>> >> +struct devlink_linecard_ops;
>> >> +
>> >>  struct devlink_linecard {
>> >>  	struct list_head list;
>> >>  	struct devlink *devlink;
>> >>  	unsigned int index;
>> >> +	const struct devlink_linecard_ops *ops;
>> >> +	void *priv;
>> >> +	enum devlink_linecard_state state;
>> >> +	const char *provisioned_type;
>> >> +};
>> >> +
>> >> +/**
>> >> + * struct devlink_linecard_ops - Linecard operations
>> >> + * @supported_types: array of supported types of linecards
>> >> + * @supported_types_count: number of elements in the array above
>> >> + * @provision: callback to provision the linecard slot with certain
>> >> + *	       type of linecard
>> >> + * @unprovision: callback to unprovision the linecard slot
>> >> + */
>> >> +struct devlink_linecard_ops {
>> >> +	const char **supported_types;
>> >> +	unsigned int supported_types_count;
>> >> +	int (*provision)(struct devlink_linecard *linecard, void *priv,
>> >> +			 u32 type_index, struct netlink_ext_ack *extack);
>> >> +	int (*unprovision)(struct devlink_linecard *linecard, void *priv,
>> >> +			   struct netlink_ext_ack *extack);
>> >>  };
>> >>  
>> >>  struct devlink_sb_pool_info {
>> >> @@ -1414,9 +1437,13 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
>> >>  				   u16 pf, bool external);
>> >>  void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
>> >>  				   u16 pf, u16 vf, bool external);
>> >> -struct devlink_linecard *devlink_linecard_create(struct devlink *devlink,
>> >> -						 unsigned int linecard_index);
>> >> +struct devlink_linecard *
>> >> +devlink_linecard_create(struct devlink *devlink, unsigned int linecard_index,
>> >> +			const struct devlink_linecard_ops *ops, void *priv);
>> >>  void devlink_linecard_destroy(struct devlink_linecard *linecard);
>> >> +void devlink_linecard_provision_set(struct devlink_linecard *linecard,
>> >> +				    u32 type_index);
>> >> +void devlink_linecard_provision_clear(struct devlink_linecard *linecard);
>> >>  int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>> >>  			u32 size, u16 ingress_pools_count,
>> >>  			u16 egress_pools_count, u16 ingress_tc_count,
>> >> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> >> index e5ed0522591f..4111ddcc000b 100644
>> >> --- a/include/uapi/linux/devlink.h
>> >> +++ b/include/uapi/linux/devlink.h
>> >> @@ -131,6 +131,9 @@ enum devlink_command {
>> >>  	DEVLINK_CMD_LINECARD_NEW,
>> >>  	DEVLINK_CMD_LINECARD_DEL,
>> >>  
>> >> +	DEVLINK_CMD_LINECARD_PROVISION,
>> >> +	DEVLINK_CMD_LINECARD_UNPROVISION,
>> >
>> >I do not really see the point in these two commands. Better extend
>> >DEVLINK_CMD_LINECARD_SET to carry these attributes.
>> 
>> Yeah, I was thinking about that. Not sure it is correct though. This is
>> single purpose command. It really does not change "an attribute" as the
>> "_SET" commands are usually doing. Consider extension of "_SET" by other
>> attributes. Then it looks wrong.
>
>It is setting the type of the linecard, which is an attribute of the
>linecard.

Hmm. Still, consider the async nature. Do you have any example of attr
set with async nature? I expect the attr to be set when cmd returns 0.
IDK. Does not feel correct...


>
>> 
>> 
>> >
>> >> +
>> >>  	/* add new commands above here */
>> >>  	__DEVLINK_CMD_MAX,
>> >>  	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
>> >> @@ -329,6 +332,17 @@ enum devlink_reload_limit {
>> >>  
>> >>  #define DEVLINK_RELOAD_LIMITS_VALID_MASK (_BITUL(__DEVLINK_RELOAD_LIMIT_MAX) - 1)
>> >>  
>> >> +enum devlink_linecard_state {
>> >> +	DEVLINK_LINECARD_STATE_UNSPEC,
>> >> +	DEVLINK_LINECARD_STATE_UNPROVISIONED,
>> >> +	DEVLINK_LINECARD_STATE_UNPROVISIONING,
>> >> +	DEVLINK_LINECARD_STATE_PROVISIONING,
>> >
>> >Can you explain why these two states are necessary? Any reason the
>> >provision operation can't be synchronous? This is somewhat explained in
>> >patch #8, but it should really be explained here. Changelog says:
>> >
>> >"To avoid deadlock and to mimic actual HW flow, use workqueue
>> >to add/del ports during provisioning as the port add/del calls
>> >devlink_port_register/unregister() which take devlink mutex."
>> >
>> >The deadlock is not really a reason to have these states.
>> 
>> It is, need to avoid recursice locking
>> 
>> >'DEVLINK_CMD_PORT_SPLIT' also calls devlink_port_register() /
>> >devlink_port_unregister() and the deadlock is solved by:
>> >
>> >'internal_flags = DEVLINK_NL_FLAG_NO_LOCK'
>> 
>> Yeah, however, there, the port_index is passed down to the driver, not
>> the actual object pointer. That's why it can be done like that.
>> 
>> >
>> >A hardware flow the requires it is something else...
>> 
>> Hardware flow in case of Spectrum is async too.
>
>OK, so the changelog needs to state that these states are necessary
>because the nature of linecard provisioning is asynchronous.

Ok.


>
>> 
>> 
>> >
>> >> +	DEVLINK_LINECARD_STATE_PROVISIONED,
>> >> +
>> >> +	__DEVLINK_LINECARD_STATE_MAX,
>> >> +	DEVLINK_LINECARD_STATE_MAX = __DEVLINK_LINECARD_STATE_MAX - 1
>> >> +};
>> >> +
>> >>  enum devlink_attr {
>> >>  	/* don't change the order or add anything between, this is ABI! */
>> >>  	DEVLINK_ATTR_UNSPEC,
>> >> @@ -535,6 +549,9 @@ enum devlink_attr {
>> >>  	DEVLINK_ATTR_RELOAD_ACTION_STATS,       /* nested */
>> >>  
>> >>  	DEVLINK_ATTR_LINECARD_INDEX,		/* u32 */
>> >> +	DEVLINK_ATTR_LINECARD_STATE,		/* u8 */
>> >> +	DEVLINK_ATTR_LINECARD_TYPE,		/* string */
>> >> +	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
>> >>  
>> >>  	/* add new attributes above here, update the policy in devlink.c */
>> >>  
>> >> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> >> index 564e921133cf..434eecc310c3 100644
>> >> --- a/net/core/devlink.c
>> >> +++ b/net/core/devlink.c
>> >> @@ -1192,7 +1192,9 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
>> >>  				    u32 seq, int flags,
>> >>  				    struct netlink_ext_ack *extack)
>> >>  {
>> >> +	struct nlattr *attr;
>> >>  	void *hdr;
>> >> +	int i;
>> >>  
>> >>  	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
>> >>  	if (!hdr)
>> >> @@ -1202,6 +1204,22 @@ static int devlink_nl_linecard_fill(struct sk_buff *msg,
>> >>  		goto nla_put_failure;
>> >>  	if (nla_put_u32(msg, DEVLINK_ATTR_LINECARD_INDEX, linecard->index))
>> >>  		goto nla_put_failure;
>> >> +	if (nla_put_u8(msg, DEVLINK_ATTR_LINECARD_STATE, linecard->state))
>> >> +		goto nla_put_failure;
>> >> +	if (linecard->state >= DEVLINK_LINECARD_STATE_PROVISIONED &&
>> >
>> >This assumes that every state added after provisioned should report the
>> >type. Better to check for the specific states
>> 
>> Yes, that is correct assumption.
>
>It is correct now, but what if tomorrow someone adds a new state? It
>can't be added before the provisioned state because it will break uapi.

Then this check will need to be changed...


>
>> 
>> 
>> >
>> >> +	    nla_put_string(msg, DEVLINK_ATTR_LINECARD_TYPE,
>> >> +			   linecard->provisioned_type))
>> >> +		goto nla_put_failure;

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-15 19:26         ` Jakub Kicinski
@ 2021-01-18 13:00           ` Jiri Pirko
  2021-01-18 17:59             ` Jakub Kicinski
  2021-01-18 22:55             ` David Ahern
  0 siblings, 2 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-18 13:00 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 08:26:17PM CET, kuba@kernel.org wrote:
>On Fri, 15 Jan 2021 15:39:06 +0100 Jiri Pirko wrote:
>> >I'm not a SFP experts so maybe someone will correct me but AFAIU
>> >the QSFP (for optics) is the same regardless of breakout. It's the
>> >passive optical strands that are either bundled or not. So there is 
>> >no way for the system to detect the cable type (AFAIK).  
>> 
>> For SFP module, you are able to detect those.
>
>Not sure you understand what I'm saying. Maybe you're thinking about
>DACs? This is a optical cable for breakout:
>
>https://www.fs.com/products/68048.html
>
>There is no electronics in it to "detect" things AFAIU. Same QSFP can
>be used with this cable or a non-breakout.

Ah, got you.


>
>> >Or to put it differently IMO the netdev should be provisioned if the
>> >system has a port into which user can plug in a cable. When there is   
>> 
>> Not really. For slit cables, the ports are provisioned not matter which
>> cable is connected, slitter 1->2/1->4 or 1->1 cable.
>> 
>> 
>> >a line card-sized hole in the chassis, I'd be surprised to see ports.
>> >
>> >That said I never worked with real world routers so maybe that's what
>> >they do. Maybe some with a Cisco router in the basement can tell us? :)  
>> 
>> The need for provision/pre-configure splitter/linecard is that the
>> ports/netdevices do not disapper/reappear when you replace
>> splitter/linecard. Consider a faulty linecard with one port burned. You
>> just want to replace it with new one. And in that case, you really don't
>> want kernel to remove netdevices and possibly mess up routing for
>> example.
>
>Having a single burned port sounds like a relatively rare scenario.

Hmm, rare in scale is common...


>Reconfiguring routing is not the end of the world.

Well, yes, but you don't really want netdevices to come and go then you
plug in/out cables/modules. That's why we have split implemented as we
do. I don't understand why do you think linecards are different.

Plus, I'm not really sure that our hw can report the type, will check.
One way or another, I think that both configuration flows have valid
usecase. Some user may want pre-configuration, some user may want auto.
Btw, it is possible to implement splitter cable in auto mode as well.


>
>> >If the device really needs this configuration / can't detect things
>> >automatically, then we gotta do something like what you have.
>> >The only question is do we still want to call it a line card.
>> >Sounds more like a front panel module. At Netronome we called 
>> >those phymods.  
>> 
>> Sure, the name is up to the discussion. We call it "linecard"
>> internally. I don't care about the name.
>
>Yeah, let's call it something more appropriate to indicate its
>breakout/retimer/gearbox nature, and we'll be good :)

Well, it can contain much more. It can contain a smartnic/fpga/whatever
for example. Not sure we can find something that fits to all cases.
I was thinking about it in the past, I think that the linecard is quite
appropriate. It connects with lines/lanes, and it does something,
either phy/gearbox, or just interconnects the lanes using smartnic/fpga
for example.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-15 18:01     ` Ido Schimmel
@ 2021-01-18 13:03       ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-18 13:03 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Fri, Jan 15, 2021 at 07:01:45PM CET, idosch@idosch.org wrote:
>On Fri, Jan 15, 2021 at 05:55:59PM +0100, Jiri Pirko wrote:
>> Fri, Jan 15, 2021 at 04:43:57PM CET, idosch@idosch.org wrote:
>> >On Wed, Jan 13, 2021 at 01:12:12PM +0100, Jiri Pirko wrote:
>> >> # Create a new netdevsim device, with no ports and 2 line cards:
>> >> $ echo "10 0 2" >/sys/bus/netdevsim/new_device
>> >> $ devlink port # No ports are listed
>> >> $ devlink lc
>> >> netdevsim/netdevsim10:
>> >>   lc 0 state unprovisioned
>> >>     supported_types:
>> >>        card1port card2ports card4ports
>> >>   lc 1 state unprovisioned
>> >>     supported_types:
>> >>        card1port card2ports card4ports
>> >> 
>> >> # Note that driver advertizes supported line card types. In case of
>> >> # netdevsim, these are 3.
>> >> 
>> >> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
>> >
>> >Why do we need a separate command for that? You actually introduced
>> >'DEVLINK_CMD_LINECARD_SET' in patch #1, but it's never used.
>> >
>> >I prefer:
>> >
>> >devlink lc set netdevsim/netdevsim10 index 0 state provision type card4ports
>> 
>> This is misleading. This is actually not setting state. The state gets
>> changed upon successful provisioning process. Also, one may think that
>> he can set other states, but he can't. I don't like this at all :/
>
>So make state a read-only attribute. You really only care about setting
>the type.
>
>To provision:
>
># devlink lc set netdevsim/netdevsim10 index 0 type card4ports
>
>To unprovsion:
>
># devlink lc set netdevsim/netdevsim10 index 0 type none
>
>Or:
>
># devlink lc set netdevsim/netdevsim10 index 0 notype

Hmm, okay, that might work. And I can add state "FAILED_PROVISION" what
would indicate that after the type was set by the user, driver was not
able to successfully provision. The the user has to set "notype" & "type
x" again. Sounds good?


>
>> 
>> 
>> >devlink lc set netdevsim/netdevsim10 index 0 state unprovision
>> >
>> >It is consistent with the GET/SET/NEW/DEL pattern used by other
>> >commands.
>> 
>> Not really, see split port for example. This is similar to that.
>
>It's not. The split command creates new objects whereas this command
>modifies an existing object.

You are right.


>
>> 
>> >
>> >> $ devlink lc
>> >> netdevsim/netdevsim10:
>> >>   lc 0 state provisioned type card4ports
>> >>     supported_types:
>> >>        card1port card2ports card4ports
>> >>   lc 1 state unprovisioned
>> >>     supported_types:
>> >>        card1port card2ports card4ports
>> >> $ devlink port
>> >> netdevsim/netdevsim10/1000: type eth netdev eni10nl0p1 flavour physical lc 0 port 1 splittable false
>> >> netdevsim/netdevsim10/1001: type eth netdev eni10nl0p2 flavour physical lc 0 port 2 splittable false
>> >> netdevsim/netdevsim10/1002: type eth netdev eni10nl0p3 flavour physical lc 0 port 3 splittable false
>> >> netdevsim/netdevsim10/1003: type eth netdev eni10nl0p4 flavour physical lc 0 port 4 splittable false
>> >> #                                                 ^^                    ^^^^
>> >> #                                     netdev name adjusted          index of a line card this port belongs to
>> >> 
>> >> $ ip link set eni10nl0p1 up 
>> >> $ ip link show eni10nl0p1   
>> >> 165: eni10nl0p1: <NO-CARRIER,BROADCAST,NOARP,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
>> >>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>> >> 
>> >> # Now activate the line card using debugfs. That emulates plug-in event
>> >> # on real hardware:
>> >> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>> >> $ ip link show eni10nl0p1
>> >> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>> >>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>> >> # The carrier is UP now.
>> >> 
>> >> Jiri Pirko (10):
>> >>   devlink: add support to create line card and expose to user
>> >>   devlink: implement line card provisioning
>> >>   devlink: implement line card active state
>> >>   devlink: append split port number to the port name
>> >>   devlink: add port to line card relationship set
>> >>   netdevsim: introduce line card support
>> >>   netdevsim: allow port objects to be linked with line cards
>> >>   netdevsim: create devlink line card object and implement provisioning
>> >>   netdevsim: implement line card activation
>> >>   selftests: add netdevsim devlink lc test
>> >> 
>> >>  drivers/net/netdevsim/bus.c                   |  21 +-
>> >>  drivers/net/netdevsim/dev.c                   | 370 ++++++++++++++-
>> >>  drivers/net/netdevsim/netdev.c                |   2 +
>> >>  drivers/net/netdevsim/netdevsim.h             |  23 +
>> >>  include/net/devlink.h                         |  44 ++
>> >>  include/uapi/linux/devlink.h                  |  25 +
>> >>  net/core/devlink.c                            | 443 +++++++++++++++++-
>> >>  .../drivers/net/netdevsim/devlink.sh          |  62 ++-
>> >>  8 files changed, 964 insertions(+), 26 deletions(-)
>> >> 
>> >> -- 
>> >> 2.26.2
>> >> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-18 13:00           ` Jiri Pirko
@ 2021-01-18 17:59             ` Jakub Kicinski
  2021-01-19 11:51               ` Jiri Pirko
  2021-01-18 22:55             ` David Ahern
  1 sibling, 1 reply; 80+ messages in thread
From: Jakub Kicinski @ 2021-01-18 17:59 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

On Mon, 18 Jan 2021 14:00:09 +0100 Jiri Pirko wrote:
> >> >Or to put it differently IMO the netdev should be provisioned if the
> >> >system has a port into which user can plug in a cable. When there is     
> >> 
> >> Not really. For slit cables, the ports are provisioned not matter which
> >> cable is connected, slitter 1->2/1->4 or 1->1 cable.
> >> 
> >>   
> >> >a line card-sized hole in the chassis, I'd be surprised to see ports.
> >> >
> >> >That said I never worked with real world routers so maybe that's what
> >> >they do. Maybe some with a Cisco router in the basement can tell us? :)    
> >> 
> >> The need for provision/pre-configure splitter/linecard is that the
> >> ports/netdevices do not disapper/reappear when you replace
> >> splitter/linecard. Consider a faulty linecard with one port burned. You
> >> just want to replace it with new one. And in that case, you really don't
> >> want kernel to remove netdevices and possibly mess up routing for
> >> example.  
> >
> >Having a single burned port sounds like a relatively rare scenario.  
> 
> Hmm, rare in scale is common...

Sure but at a scale of million switches it doesn't matter if a couple
are re-configuring their routing.

> >Reconfiguring routing is not the end of the world.  
> 
> Well, yes, but you don't really want netdevices to come and go then you
> plug in/out cables/modules. That's why we have split implemented as we
> do. I don't understand why do you think linecards are different.

If I have an unused port it will still show up as a netdev.
If I have an unused phymod slot w/ a slot cover in it, why would there
be a netdev? Our definition of a physical port is something like "a
socket for a networking cable on the outside of the device". With your
code I can "provision" a phymod and there is no whole to plug in a
cable. If we follow the same logic, if I have a server with PCIe
hotplug, why can't I "provision" some netdevs for a NIC that I will
plug in later?

> Plus, I'm not really sure that our hw can report the type, will check.

I think that's key.

> One way or another, I think that both configuration flows have valid
> usecase. Some user may want pre-configuration, some user may want auto.
> Btw, it is possible to implement splitter cable in auto mode as well.

Auto as in iterate over possible configs until link up? That's nasty.

> >> >If the device really needs this configuration / can't detect things
> >> >automatically, then we gotta do something like what you have.
> >> >The only question is do we still want to call it a line card.
> >> >Sounds more like a front panel module. At Netronome we called 
> >> >those phymods.    
> >> 
> >> Sure, the name is up to the discussion. We call it "linecard"
> >> internally. I don't care about the name.  
> >
> >Yeah, let's call it something more appropriate to indicate its
> >breakout/retimer/gearbox nature, and we'll be good :)  
> 
> Well, it can contain much more. It can contain a smartnic/fpga/whatever
> for example. Not sure we can find something that fits to all cases.
> I was thinking about it in the past, I think that the linecard is quite
> appropriate. It connects with lines/lanes, and it does something,
> either phy/gearbox, or just interconnects the lanes using smartnic/fpga
> for example.

If it has a FPGA / NPU in it, it's definitely auto-discoverable. 
I don't understand why you think that it's okay to "provision" NICs
which aren't there but only for this particular use case.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
                   ` (13 preceding siblings ...)
  2021-01-15 15:43 ` Ido Schimmel
@ 2021-01-18 18:01 ` Edwin Peer
  2021-01-18 22:57   ` David Ahern
  14 siblings, 1 reply; 80+ messages in thread
From: Edwin Peer @ 2021-01-18 18:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, David S . Miller, Jakub Kicinski, Jacob Keller, roopa, mlxsw

[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]

On Wed, Jan 13, 2021 at 4:14 AM Jiri Pirko <jiri@resnulli.us> wrote:

> To resolve this, a concept of "provisioning" is introduced.
> The user may "provision" certain slot with a line card type.
> Driver then creates all instances (devlink ports, netdevices, etc)
> related to this line card type. The carrier of netdevices stays down.
> Once the line card is inserted and activated, the carrier of the
> related netdevices goes up.

Do we need to start distinguishing different reasons for carrier down,
or have some kind of device not ready state instead?

I'm facing a similar issue with NIC firmware that isn't yet ready by
device open time, but have been resisting the urge to lie to the stack
about the state of the device and use link state as the next gate.
Sure, most things will just work most of the time, but the problems
with this approach are manifold. Firstly, at least in the NIC case,
the user may confuse this state for some kind of cable issue and go
looking in the wrong place for a solution. But, there are also several
ways the initialization can fail after this point and now the device
is administratively UP, but can never be UP, with no sanctioned way to
communicate the failure. Aren't the issues here similar?

Regards,
Edwin Peer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-18 13:00           ` Jiri Pirko
  2021-01-18 17:59             ` Jakub Kicinski
@ 2021-01-18 22:55             ` David Ahern
  2021-01-22  8:01               ` Jiri Pirko
  1 sibling, 1 reply; 80+ messages in thread
From: David Ahern @ 2021-01-18 22:55 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

On 1/18/21 6:00 AM, Jiri Pirko wrote:
> 
>> Reconfiguring routing is not the end of the world.
> Well, yes, but you don't really want netdevices to come and go then you
> plug in/out cables/modules. That's why we have split implemented as we

And you don't want a routing daemon to use netdevices which are not
valid due to non-existence. Best case with what you want is carrier down
on the LC's netdevices and that destroys routing.

> do. I don't understand why do you think linecards are different.

I still don't get why you expect linecards to be different than any
other hotplug device.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-18 18:01 ` Edwin Peer
@ 2021-01-18 22:57   ` David Ahern
  2021-01-18 23:40     ` Edwin Peer
  0 siblings, 1 reply; 80+ messages in thread
From: David Ahern @ 2021-01-18 22:57 UTC (permalink / raw)
  To: Edwin Peer, Jiri Pirko
  Cc: netdev, David S . Miller, Jakub Kicinski, Jacob Keller, roopa, mlxsw

On 1/18/21 11:01 AM, Edwin Peer wrote:
> I'm facing a similar issue with NIC firmware that isn't yet ready by
> device open time, but have been resisting the urge to lie to the stack

why not have the ndo_open return -EBUSY or -EAGAIN to tell S/W to try
again 'later'?

> about the state of the device and use link state as the next gate.
> Sure, most things will just work most of the time, but the problems
> with this approach are manifold. Firstly, at least in the NIC case,
> the user may confuse this state for some kind of cable issue and go
> looking in the wrong place for a solution. But, there are also several
> ways the initialization can fail after this point and now the device
> is administratively UP, but can never be UP, with no sanctioned way to
> communicate the failure. Aren't the issues here similar?


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-18 22:57   ` David Ahern
@ 2021-01-18 23:40     ` Edwin Peer
  2021-01-19  2:39       ` David Ahern
  0 siblings, 1 reply; 80+ messages in thread
From: Edwin Peer @ 2021-01-18 23:40 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, David S . Miller, Jakub Kicinski,
	Jacob Keller, roopa, mlxsw

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

On Mon, Jan 18, 2021 at 2:57 PM David Ahern <dsahern@gmail.com> wrote:

> On 1/18/21 11:01 AM, Edwin Peer wrote:
> > I'm facing a similar issue with NIC firmware that isn't yet ready by
> > device open time, but have been resisting the urge to lie to the stack
>
> why not have the ndo_open return -EBUSY or -EAGAIN to tell S/W to try
> again 'later'?

Indeed, this is what we ended up doing, although we still need to
confirm Network Manager, systemd and whatever else our customers might
use do the necessary to satisfy the user requirement to handle the
delayed init.

Only reason I piped up is that this line card thing seems to introduce
a similar issue.

Regards,
Edwin Peer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-18 23:40     ` Edwin Peer
@ 2021-01-19  2:39       ` David Ahern
  2021-01-19  5:06         ` Edwin Peer
  0 siblings, 1 reply; 80+ messages in thread
From: David Ahern @ 2021-01-19  2:39 UTC (permalink / raw)
  To: Edwin Peer
  Cc: Jiri Pirko, netdev, David S . Miller, Jakub Kicinski,
	Jacob Keller, roopa, mlxsw

On 1/18/21 4:40 PM, Edwin Peer wrote:
> On Mon, Jan 18, 2021 at 2:57 PM David Ahern <dsahern@gmail.com> wrote:
> 
>> On 1/18/21 11:01 AM, Edwin Peer wrote:
>>> I'm facing a similar issue with NIC firmware that isn't yet ready by
>>> device open time, but have been resisting the urge to lie to the stack
>>
>> why not have the ndo_open return -EBUSY or -EAGAIN to tell S/W to try
>> again 'later'?
> 
> Indeed, this is what we ended up doing, although we still need to
> confirm Network Manager, systemd and whatever else our customers might
> use do the necessary to satisfy the user requirement to handle the
> delayed init.

I am not surprised about the issue - boot times have been improved and
devices have gotten more complicated. And I was wondering how network
managers (add ifupdown{2} to that list) would handle an EAGAIN. You
could have an event sent -- e.g., IFLA_EVENT_FW_READY -- to allow
managers to avoid polling. Redundant for multiple netdev's per device,
but makes it event driven.

> 
> Only reason I piped up is that this line card thing seems to introduce
> a similar issue.

Seems reasonable.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-19  2:39       ` David Ahern
@ 2021-01-19  5:06         ` Edwin Peer
  0 siblings, 0 replies; 80+ messages in thread
From: Edwin Peer @ 2021-01-19  5:06 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, David S . Miller, Jakub Kicinski,
	Jacob Keller, roopa, mlxsw

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

On Mon, Jan 18, 2021 at 6:39 PM David Ahern <dsahern@gmail.com> wrote:

> > Indeed, this is what we ended up doing, although we still need to
> > confirm Network Manager, systemd and whatever else our customers might
> > use do the necessary to satisfy the user requirement to handle the
> > delayed init.
>
> I am not surprised about the issue - boot times have been improved and
> devices have gotten more complicated. And I was wondering how network
> managers (add ifupdown{2} to that list) would handle an EAGAIN. You
> could have an event sent -- e.g., IFLA_EVENT_FW_READY -- to allow
> managers to avoid polling. Redundant for multiple netdev's per device,
> but makes it event driven.

This is what I was hinting at by talking about another device state.
For that, there would necessarily need to be an event to inform user
space about the transition out of said state into normal open/up. Of
course, the tools would need to be updated to know about such a new
event too. EAGAIN does seem simpler. In our case, we don't expect to
be polling too long, or frequently for that matter. It is
exceptionally rare that the firmware wouldn't be ready, but it can
happen.

Regards,
Edwin Peer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4160 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-18 17:59             ` Jakub Kicinski
@ 2021-01-19 11:51               ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-19 11:51 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, jacob.e.keller, roopa, mlxsw

Mon, Jan 18, 2021 at 06:59:28PM CET, kuba@kernel.org wrote:
>On Mon, 18 Jan 2021 14:00:09 +0100 Jiri Pirko wrote:
>> >> >Or to put it differently IMO the netdev should be provisioned if the
>> >> >system has a port into which user can plug in a cable. When there is     
>> >> 
>> >> Not really. For slit cables, the ports are provisioned not matter which
>> >> cable is connected, slitter 1->2/1->4 or 1->1 cable.
>> >> 
>> >>   
>> >> >a line card-sized hole in the chassis, I'd be surprised to see ports.
>> >> >
>> >> >That said I never worked with real world routers so maybe that's what
>> >> >they do. Maybe some with a Cisco router in the basement can tell us? :)    
>> >> 
>> >> The need for provision/pre-configure splitter/linecard is that the
>> >> ports/netdevices do not disapper/reappear when you replace
>> >> splitter/linecard. Consider a faulty linecard with one port burned. You
>> >> just want to replace it with new one. And in that case, you really don't
>> >> want kernel to remove netdevices and possibly mess up routing for
>> >> example.  
>> >
>> >Having a single burned port sounds like a relatively rare scenario.  
>> 
>> Hmm, rare in scale is common...
>
>Sure but at a scale of million switches it doesn't matter if a couple
>are re-configuring their routing.
>
>> >Reconfiguring routing is not the end of the world.  
>> 
>> Well, yes, but you don't really want netdevices to come and go then you
>> plug in/out cables/modules. That's why we have split implemented as we
>> do. I don't understand why do you think linecards are different.
>
>If I have an unused port it will still show up as a netdev.
>If I have an unused phymod slot w/ a slot cover in it, why would there
>be a netdev? Our definition of a physical port is something like "a
>socket for a networking cable on the outside of the device". With your
>code I can "provision" a phymod and there is no whole to plug in a
>cable. If we follow the same logic, if I have a server with PCIe
>hotplug, why can't I "provision" some netdevs for a NIC that I will
>plug in later?
>
>> Plus, I'm not really sure that our hw can report the type, will check.
>
>I think that's key.

So, it can't. The driver is only aware of "activation" of the linecard
being successful or not.


>
>> One way or another, I think that both configuration flows have valid
>> usecase. Some user may want pre-configuration, some user may want auto.
>> Btw, it is possible to implement splitter cable in auto mode as well.
>
>Auto as in iterate over possible configs until link up? That's nasty.
>
>> >> >If the device really needs this configuration / can't detect things
>> >> >automatically, then we gotta do something like what you have.
>> >> >The only question is do we still want to call it a line card.
>> >> >Sounds more like a front panel module. At Netronome we called 
>> >> >those phymods.    
>> >> 
>> >> Sure, the name is up to the discussion. We call it "linecard"
>> >> internally. I don't care about the name.  
>> >
>> >Yeah, let's call it something more appropriate to indicate its
>> >breakout/retimer/gearbox nature, and we'll be good :)  
>> 
>> Well, it can contain much more. It can contain a smartnic/fpga/whatever
>> for example. Not sure we can find something that fits to all cases.
>> I was thinking about it in the past, I think that the linecard is quite
>> appropriate. It connects with lines/lanes, and it does something,
>> either phy/gearbox, or just interconnects the lanes using smartnic/fpga
>> for example.
>
>If it has a FPGA / NPU in it, it's definitely auto-discoverable. 
>I don't understand why you think that it's okay to "provision" NICs
>which aren't there but only for this particular use case.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-14  2:07 ` [patch net-next RFC 00/10] introduce line card support for modular switch Andrew Lunn
  2021-01-14  7:39   ` Jiri Pirko
@ 2021-01-19 11:56   ` Jiri Pirko
  2021-01-19 14:51     ` Andrew Lunn
  2021-01-19 16:23     ` David Ahern
  1 sibling, 2 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-19 11:56 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Thu, Jan 14, 2021 at 03:07:18AM CET, andrew@lunn.ch wrote:
>> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
>> $ devlink lc
>> netdevsim/netdevsim10:
>>   lc 0 state provisioned type card4ports
>>     supported_types:
>>        card1port card2ports card4ports
>>   lc 1 state unprovisioned
>>     supported_types:
>>        card1port card2ports card4ports
>
>Hi Jiri
>
>> # Now activate the line card using debugfs. That emulates plug-in event
>> # on real hardware:
>> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>> $ ip link show eni10nl0p1
>> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>> # The carrier is UP now.
>
>What is missing from the devlink lc view is what line card is actually
>in the slot. Say if i provision for a card4port, but actually insert a
>card2port. It would be nice to have something like:

I checked, our hw does not support that. Only provides info that
linecard activation was/wasn't successful.


>
> $ devlink lc
> netdevsim/netdevsim10:
>   lc 0 state provisioned type card4ports
>     supported_types:
>        card1port card2ports card4ports
>     inserted_type:
>        card2ports;
>   lc 1 state unprovisioned
>     supported_types:
>        card1port card2ports card4ports
>     inserted_type:
>        None
>
>I assume if i prevision for card4ports but actually install a
>card2ports, all the interfaces stay down?
>
>Maybe
>
>> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>
>should actually be
>    echo "card2ports" > /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>
>so you can emulate somebody putting the wrong card in the slot?
>
>    Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-19 11:56   ` Jiri Pirko
@ 2021-01-19 14:51     ` Andrew Lunn
  2021-01-20  8:36       ` Jiri Pirko
  2021-01-19 16:23     ` David Ahern
  1 sibling, 1 reply; 80+ messages in thread
From: Andrew Lunn @ 2021-01-19 14:51 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On Tue, Jan 19, 2021 at 12:56:10PM +0100, Jiri Pirko wrote:
> Thu, Jan 14, 2021 at 03:07:18AM CET, andrew@lunn.ch wrote:
> >> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
> >> $ devlink lc
> >> netdevsim/netdevsim10:
> >>   lc 0 state provisioned type card4ports
> >>     supported_types:
> >>        card1port card2ports card4ports
> >>   lc 1 state unprovisioned
> >>     supported_types:
> >>        card1port card2ports card4ports
> >
> >Hi Jiri
> >
> >> # Now activate the line card using debugfs. That emulates plug-in event
> >> # on real hardware:
> >> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
> >> $ ip link show eni10nl0p1
> >> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> >>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
> >> # The carrier is UP now.
> >
> >What is missing from the devlink lc view is what line card is actually
> >in the slot. Say if i provision for a card4port, but actually insert a
> >card2port. It would be nice to have something like:
> 
> I checked, our hw does not support that. Only provides info that
> linecard activation was/wasn't successful.

Hi Jiri

Is this a firmware limitation? There is no API to extract the
information from the firmware to the host? The firmware itself knows
there is a mismatch and refuses to configure the line card, and
prevents the MAC going up?

Even if you cannot do this now, it seems likely in future firmware
versions you will be able to, so maybe at least define the netlink
attributes now? As well as attributes indicating activation was
successful.

	Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-19 11:56   ` Jiri Pirko
  2021-01-19 14:51     ` Andrew Lunn
@ 2021-01-19 16:23     ` David Ahern
  2021-01-20  8:37       ` Jiri Pirko
  1 sibling, 1 reply; 80+ messages in thread
From: David Ahern @ 2021-01-19 16:23 UTC (permalink / raw)
  To: Jiri Pirko, Andrew Lunn; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

On 1/19/21 4:56 AM, Jiri Pirko wrote:
> Thu, Jan 14, 2021 at 03:07:18AM CET, andrew@lunn.ch wrote:
>>> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
>>> $ devlink lc
>>> netdevsim/netdevsim10:
>>>   lc 0 state provisioned type card4ports
>>>     supported_types:
>>>        card1port card2ports card4ports
>>>   lc 1 state unprovisioned
>>>     supported_types:
>>>        card1port card2ports card4ports
>>
>> Hi Jiri
>>
>>> # Now activate the line card using debugfs. That emulates plug-in event
>>> # on real hardware:
>>> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>>> $ ip link show eni10nl0p1
>>> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>>> # The carrier is UP now.
>>
>> What is missing from the devlink lc view is what line card is actually
>> in the slot. Say if i provision for a card4port, but actually insert a
>> card2port. It would be nice to have something like:
> 
> I checked, our hw does not support that. Only provides info that
> linecard activation was/wasn't successful.
> 

There is no way for the supervisor / management card to probe and see
what card is actually inserted in a given slot? That seems like a
serious design deficiency. What about some agent running on the line
card talking to an agent on the supervisor to provide that information?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-19 14:51     ` Andrew Lunn
@ 2021-01-20  8:36       ` Jiri Pirko
  2021-01-20 13:56         ` Andrew Lunn
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-20  8:36 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Tue, Jan 19, 2021 at 03:51:49PM CET, andrew@lunn.ch wrote:
>On Tue, Jan 19, 2021 at 12:56:10PM +0100, Jiri Pirko wrote:
>> Thu, Jan 14, 2021 at 03:07:18AM CET, andrew@lunn.ch wrote:
>> >> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
>> >> $ devlink lc
>> >> netdevsim/netdevsim10:
>> >>   lc 0 state provisioned type card4ports
>> >>     supported_types:
>> >>        card1port card2ports card4ports
>> >>   lc 1 state unprovisioned
>> >>     supported_types:
>> >>        card1port card2ports card4ports
>> >
>> >Hi Jiri
>> >
>> >> # Now activate the line card using debugfs. That emulates plug-in event
>> >> # on real hardware:
>> >> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>> >> $ ip link show eni10nl0p1
>> >> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>> >>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>> >> # The carrier is UP now.
>> >
>> >What is missing from the devlink lc view is what line card is actually
>> >in the slot. Say if i provision for a card4port, but actually insert a
>> >card2port. It would be nice to have something like:
>> 
>> I checked, our hw does not support that. Only provides info that
>> linecard activation was/wasn't successful.
>
>Hi Jiri
>
>Is this a firmware limitation? There is no API to extract the
>information from the firmware to the host? The firmware itself knows
>there is a mismatch and refuses to configure the line card, and
>prevents the MAC going up?

No, the FW does not know. The ASIC is not physically able to get the
linecard type. Yes, it is odd, I agree. The linecard type is known to
the driver which operates on i2c. This driver takes care of power
management of the linecard, among other tasks.


>
>Even if you cannot do this now, it seems likely in future firmware
>versions you will be able to, so maybe at least define the netlink

Sure, for netdevsim that is not problem. Our current hw does not support
it, the future may.


>attributes now? As well as attributes indicating activation was
>successful.

State "ACTIVATED" is that indication. It is in this RFC.


>
>	Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-19 16:23     ` David Ahern
@ 2021-01-20  8:37       ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-20  8:37 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrew Lunn, netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

Tue, Jan 19, 2021 at 05:23:19PM CET, dsahern@gmail.com wrote:
>On 1/19/21 4:56 AM, Jiri Pirko wrote:
>> Thu, Jan 14, 2021 at 03:07:18AM CET, andrew@lunn.ch wrote:
>>>> $ devlink lc provision netdevsim/netdevsim10 lc 0 type card4ports
>>>> $ devlink lc
>>>> netdevsim/netdevsim10:
>>>>   lc 0 state provisioned type card4ports
>>>>     supported_types:
>>>>        card1port card2ports card4ports
>>>>   lc 1 state unprovisioned
>>>>     supported_types:
>>>>        card1port card2ports card4ports
>>>
>>> Hi Jiri
>>>
>>>> # Now activate the line card using debugfs. That emulates plug-in event
>>>> # on real hardware:
>>>> $ echo "Y"> /sys/kernel/debug/netdevsim/netdevsim10/linecards/0/active
>>>> $ ip link show eni10nl0p1
>>>> 165: eni10nl0p1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>>>     link/ether 7e:2d:05:93:d3:d1 brd ff:ff:ff:ff:ff:ff
>>>> # The carrier is UP now.
>>>
>>> What is missing from the devlink lc view is what line card is actually
>>> in the slot. Say if i provision for a card4port, but actually insert a
>>> card2port. It would be nice to have something like:
>> 
>> I checked, our hw does not support that. Only provides info that
>> linecard activation was/wasn't successful.
>> 
>
>There is no way for the supervisor / management card to probe and see
>what card is actually inserted in a given slot? That seems like a
>serious design deficiency. What about some agent running on the line
>card talking to an agent on the supervisor to provide that information?

The ASIC does not have this info. The linecard type is exposed over i2c
interface, different driver sits on top of it.
I agree it is odd, but that is how it is for our hw, unfortunatelly.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-20  8:36       ` Jiri Pirko
@ 2021-01-20 13:56         ` Andrew Lunn
  2021-01-20 23:41           ` Jakub Kicinski
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Lunn @ 2021-01-20 13:56 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, kuba, jacob.e.keller, roopa, mlxsw

> No, the FW does not know. The ASIC is not physically able to get the
> linecard type. Yes, it is odd, I agree. The linecard type is known to
> the driver which operates on i2c. This driver takes care of power
> management of the linecard, among other tasks.

So what does activated actually mean for your hardware? It seems to
mean something like: Some random card has been plugged in, we have no
idea what, but it has power, and we have enabled the MACs as
provisioned, which if you are lucky might match the hardware?

The foundations of this feature seems dubious.

    Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-20 13:56         ` Andrew Lunn
@ 2021-01-20 23:41           ` Jakub Kicinski
  2021-01-21  0:01             ` Andrew Lunn
  2021-01-21 15:32             ` Jiri Pirko
  0 siblings, 2 replies; 80+ messages in thread
From: Jakub Kicinski @ 2021-01-20 23:41 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Jiri Pirko, netdev, davem, jacob.e.keller, roopa, mlxsw

On Wed, 20 Jan 2021 14:56:46 +0100 Andrew Lunn wrote:
> > No, the FW does not know. The ASIC is not physically able to get the
> > linecard type. Yes, it is odd, I agree. The linecard type is known to
> > the driver which operates on i2c. This driver takes care of power
> > management of the linecard, among other tasks.  
> 
> So what does activated actually mean for your hardware? It seems to
> mean something like: Some random card has been plugged in, we have no
> idea what, but it has power, and we have enabled the MACs as
> provisioned, which if you are lucky might match the hardware?
> 
> The foundations of this feature seems dubious.

But Jiri also says "The linecard type is known to the driver which
operates on i2c." which sounds like there is some i2c driver (in user
space?) which talks to the card and _does_ have the info? Maybe I'm
misreading it. What's the i2c driver?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-20 23:41           ` Jakub Kicinski
@ 2021-01-21  0:01             ` Andrew Lunn
  2021-01-21  0:16               ` Jakub Kicinski
  2021-01-21 15:34               ` Jiri Pirko
  2021-01-21 15:32             ` Jiri Pirko
  1 sibling, 2 replies; 80+ messages in thread
From: Andrew Lunn @ 2021-01-21  0:01 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev, davem, jacob.e.keller, roopa, mlxsw

On Wed, Jan 20, 2021 at 03:41:58PM -0800, Jakub Kicinski wrote:
> On Wed, 20 Jan 2021 14:56:46 +0100 Andrew Lunn wrote:
> > > No, the FW does not know. The ASIC is not physically able to get the
> > > linecard type. Yes, it is odd, I agree. The linecard type is known to
> > > the driver which operates on i2c. This driver takes care of power
> > > management of the linecard, among other tasks.  
> > 
> > So what does activated actually mean for your hardware? It seems to
> > mean something like: Some random card has been plugged in, we have no
> > idea what, but it has power, and we have enabled the MACs as
> > provisioned, which if you are lucky might match the hardware?
> > 
> > The foundations of this feature seems dubious.
> 
> But Jiri also says "The linecard type is known to the driver which
> operates on i2c." which sounds like there is some i2c driver (in user
> space?) which talks to the card and _does_ have the info? Maybe I'm
> misreading it. What's the i2c driver?

Hi Jakub

A complete guess, but i think it will be the BMC, not the ASIC. There
have been patches from Mellanox in the past for a BMC, i think sent to
arm-soc, for the ASPEED devices often used as BMCs. And the BMC is
often the device doing power management. So what might be missing is
an interface between the driver and the BMC. But that then makes the
driver system specific. A OEM who buys ASICs and makes their own board
could have their own BMC running there own BMC firmware.

All speculation...

      Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-21  0:01             ` Andrew Lunn
@ 2021-01-21  0:16               ` Jakub Kicinski
  2021-01-21 15:34               ` Jiri Pirko
  1 sibling, 0 replies; 80+ messages in thread
From: Jakub Kicinski @ 2021-01-21  0:16 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Jiri Pirko, netdev, davem, jacob.e.keller, roopa, mlxsw

On Thu, 21 Jan 2021 01:01:21 +0100 Andrew Lunn wrote:
> On Wed, Jan 20, 2021 at 03:41:58PM -0800, Jakub Kicinski wrote:
> > On Wed, 20 Jan 2021 14:56:46 +0100 Andrew Lunn wrote:  
> > > > No, the FW does not know. The ASIC is not physically able to get the
> > > > linecard type. Yes, it is odd, I agree. The linecard type is known to
> > > > the driver which operates on i2c. This driver takes care of power
> > > > management of the linecard, among other tasks.    
> > > 
> > > So what does activated actually mean for your hardware? It seems to
> > > mean something like: Some random card has been plugged in, we have no
> > > idea what, but it has power, and we have enabled the MACs as
> > > provisioned, which if you are lucky might match the hardware?
> > > 
> > > The foundations of this feature seems dubious.  
> > 
> > But Jiri also says "The linecard type is known to the driver which
> > operates on i2c." which sounds like there is some i2c driver (in user
> > space?) which talks to the card and _does_ have the info? Maybe I'm
> > misreading it. What's the i2c driver?  
> 
> Hi Jakub
> 
> A complete guess, but i think it will be the BMC, not the ASIC. There
> have been patches from Mellanox in the past for a BMC, i think sent to
> arm-soc, for the ASPEED devices often used as BMCs. And the BMC is
> often the device doing power management. So what might be missing is
> an interface between the driver and the BMC. But that then makes the
> driver system specific. A OEM who buys ASICs and makes their own board
> could have their own BMC running there own BMC firmware.
> 
> All speculation...

I see that does make sense 🤔 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-20 23:41           ` Jakub Kicinski
  2021-01-21  0:01             ` Andrew Lunn
@ 2021-01-21 15:32             ` Jiri Pirko
  2021-01-21 16:38               ` David Ahern
  1 sibling, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-21 15:32 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Andrew Lunn, netdev, davem, jacob.e.keller, roopa, mlxsw

Thu, Jan 21, 2021 at 12:41:58AM CET, kuba@kernel.org wrote:
>On Wed, 20 Jan 2021 14:56:46 +0100 Andrew Lunn wrote:
>> > No, the FW does not know. The ASIC is not physically able to get the
>> > linecard type. Yes, it is odd, I agree. The linecard type is known to
>> > the driver which operates on i2c. This driver takes care of power
>> > management of the linecard, among other tasks.  
>> 
>> So what does activated actually mean for your hardware? It seems to
>> mean something like: Some random card has been plugged in, we have no
>> idea what, but it has power, and we have enabled the MACs as
>> provisioned, which if you are lucky might match the hardware?
>> 
>> The foundations of this feature seems dubious.
>
>But Jiri also says "The linecard type is known to the driver which
>operates on i2c." which sounds like there is some i2c driver (in user
>space?) which talks to the card and _does_ have the info? Maybe I'm
>misreading it. What's the i2c driver?

That is Vadim's i2c kernel driver, this is going to upstream.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-21  0:01             ` Andrew Lunn
  2021-01-21  0:16               ` Jakub Kicinski
@ 2021-01-21 15:34               ` Jiri Pirko
  1 sibling, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-21 15:34 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Jakub Kicinski, netdev, davem, jacob.e.keller, roopa, mlxsw

Thu, Jan 21, 2021 at 01:01:21AM CET, andrew@lunn.ch wrote:
>On Wed, Jan 20, 2021 at 03:41:58PM -0800, Jakub Kicinski wrote:
>> On Wed, 20 Jan 2021 14:56:46 +0100 Andrew Lunn wrote:
>> > > No, the FW does not know. The ASIC is not physically able to get the
>> > > linecard type. Yes, it is odd, I agree. The linecard type is known to
>> > > the driver which operates on i2c. This driver takes care of power
>> > > management of the linecard, among other tasks.  
>> > 
>> > So what does activated actually mean for your hardware? It seems to
>> > mean something like: Some random card has been plugged in, we have no
>> > idea what, but it has power, and we have enabled the MACs as
>> > provisioned, which if you are lucky might match the hardware?
>> > 
>> > The foundations of this feature seems dubious.
>> 
>> But Jiri also says "The linecard type is known to the driver which
>> operates on i2c." which sounds like there is some i2c driver (in user
>> space?) which talks to the card and _does_ have the info? Maybe I'm
>> misreading it. What's the i2c driver?
>
>Hi Jakub
>
>A complete guess, but i think it will be the BMC, not the ASIC. There
>have been patches from Mellanox in the past for a BMC, i think sent to
>arm-soc, for the ASPEED devices often used as BMCs. And the BMC is
>often the device doing power management. So what might be missing is
>an interface between the driver and the BMC. But that then makes the
>driver system specific. A OEM who buys ASICs and makes their own board
>could have their own BMC running there own BMC firmware.
>
>All speculation...

Basically all correct.

The thing is mlxsw and the i2c driver cannot talk to each other:
1) It would be ugly
2) They may likely be on a different host


>
>      Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-21 15:32             ` Jiri Pirko
@ 2021-01-21 16:38               ` David Ahern
  2021-01-22  7:28                 ` Jiri Pirko
  2021-01-22  8:05                 ` Jiri Pirko
  0 siblings, 2 replies; 80+ messages in thread
From: David Ahern @ 2021-01-21 16:38 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: Andrew Lunn, netdev, davem, jacob.e.keller, roopa, mlxsw

On 1/21/21 8:32 AM, Jiri Pirko wrote:
> Thu, Jan 21, 2021 at 12:41:58AM CET, kuba@kernel.org wrote:
>> On Wed, 20 Jan 2021 14:56:46 +0100 Andrew Lunn wrote:
>>>> No, the FW does not know. The ASIC is not physically able to get the
>>>> linecard type. Yes, it is odd, I agree. The linecard type is known to
>>>> the driver which operates on i2c. This driver takes care of power
>>>> management of the linecard, among other tasks.  
>>>
>>> So what does activated actually mean for your hardware? It seems to
>>> mean something like: Some random card has been plugged in, we have no
>>> idea what, but it has power, and we have enabled the MACs as
>>> provisioned, which if you are lucky might match the hardware?
>>>
>>> The foundations of this feature seems dubious.
>>
>> But Jiri also says "The linecard type is known to the driver which
>> operates on i2c." which sounds like there is some i2c driver (in user
>> space?) which talks to the card and _does_ have the info? Maybe I'm
>> misreading it. What's the i2c driver?
> 
> That is Vadim's i2c kernel driver, this is going to upstream.
> 

This pre-provisioning concept makes a fragile design to work around h/w
shortcomings. You really need a way for the management card to know
exactly what was plugged in to a slot so the control plane S/W can
respond accordingly. Surely there is a way for processes on the LC to
communicate with a process on the management card - even if it is inband
packets with special headers.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-21 16:38               ` David Ahern
@ 2021-01-22  7:28                 ` Jiri Pirko
  2021-01-22 14:13                   ` Andrew Lunn
  2021-01-22  8:05                 ` Jiri Pirko
  1 sibling, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-22  7:28 UTC (permalink / raw)
  To: David Ahern
  Cc: Jakub Kicinski, Andrew Lunn, netdev, davem, jacob.e.keller, roopa, mlxsw

Thu, Jan 21, 2021 at 05:38:40PM CET, dsahern@gmail.com wrote:
>On 1/21/21 8:32 AM, Jiri Pirko wrote:
>> Thu, Jan 21, 2021 at 12:41:58AM CET, kuba@kernel.org wrote:
>>> On Wed, 20 Jan 2021 14:56:46 +0100 Andrew Lunn wrote:
>>>>> No, the FW does not know. The ASIC is not physically able to get the
>>>>> linecard type. Yes, it is odd, I agree. The linecard type is known to
>>>>> the driver which operates on i2c. This driver takes care of power
>>>>> management of the linecard, among other tasks.  
>>>>
>>>> So what does activated actually mean for your hardware? It seems to
>>>> mean something like: Some random card has been plugged in, we have no
>>>> idea what, but it has power, and we have enabled the MACs as
>>>> provisioned, which if you are lucky might match the hardware?
>>>>
>>>> The foundations of this feature seems dubious.
>>>
>>> But Jiri also says "The linecard type is known to the driver which
>>> operates on i2c." which sounds like there is some i2c driver (in user
>>> space?) which talks to the card and _does_ have the info? Maybe I'm
>>> misreading it. What's the i2c driver?
>> 
>> That is Vadim's i2c kernel driver, this is going to upstream.
>> 
>
>This pre-provisioning concept makes a fragile design to work around h/w
>shortcomings. You really need a way for the management card to know
>exactly what was plugged in to a slot so the control plane S/W can
>respond accordingly. Surely there is a way for processes on the LC to
>communicate with a process on the management card - even if it is inband
>packets with special headers.

I don't see any way. The userspace is the one who can get the info, from
the i2c driver. The mlxsw driver has no means to get that info itself.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-18 22:55             ` David Ahern
@ 2021-01-22  8:01               ` Jiri Pirko
  0 siblings, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-22  8:01 UTC (permalink / raw)
  To: David Ahern; +Cc: Jakub Kicinski, netdev, davem, jacob.e.keller, roopa, mlxsw

Mon, Jan 18, 2021 at 11:55:45PM CET, dsahern@gmail.com wrote:
>On 1/18/21 6:00 AM, Jiri Pirko wrote:
>> 
>>> Reconfiguring routing is not the end of the world.
>> Well, yes, but you don't really want netdevices to come and go then you
>> plug in/out cables/modules. That's why we have split implemented as we
>
>And you don't want a routing daemon to use netdevices which are not
>valid due to non-existence. Best case with what you want is carrier down
>on the LC's netdevices and that destroys routing.

There are other things. The user may configure the netdev parameters in
advance, like mtu, put it in a bridge, setup TC filters on it etc.
The linecard unplug/plug does not destroy the settings. This is the same
thing with split ports and that is why we have implemented split ports
in "provision" mode as well.


>
>> do. I don't understand why do you think linecards are different.
>
>I still don't get why you expect linecards to be different than any
>other hotplug device.

It it not a device, does not have "struct device" related to it.
It is just a phy part of another device.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-21 16:38               ` David Ahern
  2021-01-22  7:28                 ` Jiri Pirko
@ 2021-01-22  8:05                 ` Jiri Pirko
  1 sibling, 0 replies; 80+ messages in thread
From: Jiri Pirko @ 2021-01-22  8:05 UTC (permalink / raw)
  To: David Ahern
  Cc: Jakub Kicinski, Andrew Lunn, netdev, davem, jacob.e.keller, roopa, mlxsw

Thu, Jan 21, 2021 at 05:38:40PM CET, dsahern@gmail.com wrote:
>On 1/21/21 8:32 AM, Jiri Pirko wrote:
>> Thu, Jan 21, 2021 at 12:41:58AM CET, kuba@kernel.org wrote:
>>> On Wed, 20 Jan 2021 14:56:46 +0100 Andrew Lunn wrote:
>>>>> No, the FW does not know. The ASIC is not physically able to get the
>>>>> linecard type. Yes, it is odd, I agree. The linecard type is known to
>>>>> the driver which operates on i2c. This driver takes care of power
>>>>> management of the linecard, among other tasks.  
>>>>
>>>> So what does activated actually mean for your hardware? It seems to
>>>> mean something like: Some random card has been plugged in, we have no
>>>> idea what, but it has power, and we have enabled the MACs as
>>>> provisioned, which if you are lucky might match the hardware?
>>>>
>>>> The foundations of this feature seems dubious.
>>>
>>> But Jiri also says "The linecard type is known to the driver which
>>> operates on i2c." which sounds like there is some i2c driver (in user
>>> space?) which talks to the card and _does_ have the info? Maybe I'm
>>> misreading it. What's the i2c driver?
>> 
>> That is Vadim's i2c kernel driver, this is going to upstream.
>> 
>
>This pre-provisioning concept makes a fragile design to work around h/w
>shortcomings. You really need a way for the management card to know

Not really. As I replied to you in the other part of this thread, the
linecard is basically very similar to a splitter cable. In a way, it is
a splitter cable. And should be threated in a similar way. As a phy. Not
as a device. Cables are replaceble without netdevice reappearing. This
linecards are the same. Therefore, the concept of provisioning makes
sense for them, as it does for splitter cable.


>exactly what was plugged in to a slot so the control plane S/W can
>respond accordingly. Surely there is a way for processes on the LC to
>communicate with a process on the management card - even if it is inband
>packets with special headers.

If a device is capable of splitter cable/linecard hotplug, sure, that
may be implemented. But the user has to configure it as such, to be
aware that "cable change" may move netdevices around.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-22  7:28                 ` Jiri Pirko
@ 2021-01-22 14:13                   ` Andrew Lunn
  2021-01-26 11:33                     ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Lunn @ 2021-01-22 14:13 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller, roopa, mlxsw

> I don't see any way. The userspace is the one who can get the info, from
> the i2c driver. The mlxsw driver has no means to get that info itself.

Hi Jiri

Please can you tell us more about this i2c driver. Do you have any
architecture pictures?

It is not unknown for one driver to embed another driver inside it. So
the i2c driver could be inside the mlxsw. It is also possible to link
drivers together, the mlxsw could go find the i2c driver and make use
of its services.

   Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-22 14:13                   ` Andrew Lunn
@ 2021-01-26 11:33                     ` Jiri Pirko
  2021-01-26 13:56                       ` Andrew Lunn
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-26 11:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller,
	roopa, mlxsw, vadimp

Fri, Jan 22, 2021 at 03:13:12PM CET, andrew@lunn.ch wrote:
>> I don't see any way. The userspace is the one who can get the info, from
>> the i2c driver. The mlxsw driver has no means to get that info itself.
>
>Hi Jiri
>
>Please can you tell us more about this i2c driver. Do you have any
>architecture pictures?

Quoting Vadim Pasternak:
"
Not upstreamed yet.
It will be mlxreg-lc driver for line card in drivers/platfrom/mellanox and
additional mlxreg-pm for line card powering on/off, setting enable/disable
and handling power off upon thermal shutdown event.
"


>
>It is not unknown for one driver to embed another driver inside it. So
>the i2c driver could be inside the mlxsw. It is also possible to link
>drivers together, the mlxsw could go find the i2c driver and make use
>of its services.

Okay. Do you have examples? How could the kernel figure out the relation
of the instances?


>
>   Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-26 11:33                     ` Jiri Pirko
@ 2021-01-26 13:56                       ` Andrew Lunn
  2021-01-27  7:57                         ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Lunn @ 2021-01-26 13:56 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller,
	roopa, mlxsw, vadimp

On Tue, Jan 26, 2021 at 12:33:26PM +0100, Jiri Pirko wrote:
> Fri, Jan 22, 2021 at 03:13:12PM CET, andrew@lunn.ch wrote:
> >> I don't see any way. The userspace is the one who can get the info, from
> >> the i2c driver. The mlxsw driver has no means to get that info itself.
> >
> >Hi Jiri
> >
> >Please can you tell us more about this i2c driver. Do you have any
> >architecture pictures?
> 
> Quoting Vadim Pasternak:
> "
> Not upstreamed yet.
> It will be mlxreg-lc driver for line card in drivers/platfrom/mellanox and
> additional mlxreg-pm for line card powering on/off, setting enable/disable
> and handling power off upon thermal shutdown event.
> "
> 
> 
> >
> >It is not unknown for one driver to embed another driver inside it. So
> >the i2c driver could be inside the mlxsw. It is also possible to link
> >drivers together, the mlxsw could go find the i2c driver and make use
> >of its services.
> 
> Okay. Do you have examples? How could the kernel figure out the relation
> of the instances?

Hi Jiri

One driver, embedded into another? You actually submitted an example:

commit 6882b0aee180f2797b8803bdf699aa45c2e5f2d6
Author: Vadim Pasternak <vadimp@mellanox.com>
Date:   Wed Nov 16 15:20:44 2016 +0100

    mlxsw: Introduce support for I2C bus
    
    Add I2C bus implementation for Mellanox Technologies Switch ASICs.
    This includes command interface implementation using input / out mailboxes,
    whose location is retrieved from the firmware during probe time.
    
    Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
    Reviewed-by: Ido Schimmel <idosch@mellanox.com>
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

There are Linux standard APIs for controlling the power to devices,
the regulator API. So i assume mlxreg-pm will make use of that. There
are also standard APIs for thermal management, which again, mlxreg-pm
should be using. The regulator API allows you to find regulators by
name. So just define a sensible naming convention, and the switch
driver can lookup the regulator, and turn it on/off as needed.

I'm guessing there are no standard Linux API which mlxreg-lc fits. I'm
also not sure it offers anything useful standalone. So i would
actually embed it inside the switchdev driver, and have internal APIs
to get information about the line card.

But i'm missing big picture architecture knowledge here, there could
be reasons why these suggestions don't work.

   Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-26 13:56                       ` Andrew Lunn
@ 2021-01-27  7:57                         ` Jiri Pirko
  2021-01-27 14:14                           ` Andrew Lunn
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-27  7:57 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller,
	roopa, mlxsw, vadimp

Tue, Jan 26, 2021 at 02:56:08PM CET, andrew@lunn.ch wrote:
>On Tue, Jan 26, 2021 at 12:33:26PM +0100, Jiri Pirko wrote:
>> Fri, Jan 22, 2021 at 03:13:12PM CET, andrew@lunn.ch wrote:
>> >> I don't see any way. The userspace is the one who can get the info, from
>> >> the i2c driver. The mlxsw driver has no means to get that info itself.
>> >
>> >Hi Jiri
>> >
>> >Please can you tell us more about this i2c driver. Do you have any
>> >architecture pictures?
>> 
>> Quoting Vadim Pasternak:
>> "
>> Not upstreamed yet.
>> It will be mlxreg-lc driver for line card in drivers/platfrom/mellanox and
>> additional mlxreg-pm for line card powering on/off, setting enable/disable
>> and handling power off upon thermal shutdown event.
>> "
>> 
>> 
>> >
>> >It is not unknown for one driver to embed another driver inside it. So
>> >the i2c driver could be inside the mlxsw. It is also possible to link
>> >drivers together, the mlxsw could go find the i2c driver and make use
>> >of its services.
>> 
>> Okay. Do you have examples? How could the kernel figure out the relation
>> of the instances?
>
>Hi Jiri
>
>One driver, embedded into another? You actually submitted an example:
>
>commit 6882b0aee180f2797b8803bdf699aa45c2e5f2d6
>Author: Vadim Pasternak <vadimp@mellanox.com>
>Date:   Wed Nov 16 15:20:44 2016 +0100
>
>    mlxsw: Introduce support for I2C bus
>    
>    Add I2C bus implementation for Mellanox Technologies Switch ASICs.
>    This includes command interface implementation using input / out mailboxes,
>    whose location is retrieved from the firmware during probe time.
>    
>    Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
>    Reviewed-by: Ido Schimmel <idosch@mellanox.com>
>    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>    Signed-off-by: David S. Miller <davem@davemloft.net>
>
>There are Linux standard APIs for controlling the power to devices,
>the regulator API. So i assume mlxreg-pm will make use of that. There
>are also standard APIs for thermal management, which again, mlxreg-pm
>should be using. The regulator API allows you to find regulators by
>name. So just define a sensible naming convention, and the switch
>driver can lookup the regulator, and turn it on/off as needed.


I don't think it would apply. The thing is, i2c driver has a channel to
the linecard eeprom, from where it can read info about the linecard. The
i2c driver also knows when the linecard is plugged in, unlike mlxsw.
It acts as a standalone driver. Mlxsw has no way to directly find if the
card was plugged in (unpowered) and which type it is.

Not sure how to "embed" it. I don't think any existing API could help.
Basicall mlxsw would have to register a callback to the i2c driver
called every time card is inserted to do auto-provision.
Now consider a case when there are multiple instances of the ASIC on the
system. How to assemble a relationship between mlxsw instance and i2c
driver instance?

But again, auto-provision is only one usecase. Manual provisioning is
needed anyway. And that is exactly what my patchset is aiming to
introduce. Auto-provision can be added when/if needed later on.


>
>I'm guessing there are no standard Linux API which mlxreg-lc fits. I'm
>also not sure it offers anything useful standalone. So i would
>actually embed it inside the switchdev driver, and have internal APIs
>to get information about the line card.
>
>But i'm missing big picture architecture knowledge here, there could
>be reasons why these suggestions don't work.
>
>   Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-27  7:57                         ` Jiri Pirko
@ 2021-01-27 14:14                           ` Andrew Lunn
  2021-01-27 14:57                             ` David Ahern
  2021-01-28  8:14                             ` Jiri Pirko
  0 siblings, 2 replies; 80+ messages in thread
From: Andrew Lunn @ 2021-01-27 14:14 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller,
	roopa, mlxsw, vadimp

> >There are Linux standard APIs for controlling the power to devices,
> >the regulator API. So i assume mlxreg-pm will make use of that. There
> >are also standard APIs for thermal management, which again, mlxreg-pm
> >should be using. The regulator API allows you to find regulators by
> >name. So just define a sensible naming convention, and the switch
> >driver can lookup the regulator, and turn it on/off as needed.
> 
> 
> I don't think it would apply. The thing is, i2c driver has a channel to
> the linecard eeprom, from where it can read info about the linecard. The
> i2c driver also knows when the linecard is plugged in, unlike mlxsw.
> It acts as a standalone driver. Mlxsw has no way to directly find if the
> card was plugged in (unpowered) and which type it is.
> 
> Not sure how to "embed" it. I don't think any existing API could help.
> Basicall mlxsw would have to register a callback to the i2c driver
> called every time card is inserted to do auto-provision.
> Now consider a case when there are multiple instances of the ASIC on the
> system. How to assemble a relationship between mlxsw instance and i2c
> driver instance?

You have that knowledge already, otherwise you cannot solve this
problem at all. The switch is an PCIe device right? So when the bus is
enumerated, the driver loads. How do you bind the i2c driver to the
i2c bus? You cannot enumerate i2c, so you must have some hard coded
knowledge somewhere? You just need to get that knowledge into the
mlxsw driver so it can bind its internal i2c client driver to the i2c
bus. That way you avoid user space, i guess maybe udev rules, or some
daemon monitoring propriety /sys files?

> But again, auto-provision is only one usecase. Manual provisioning is
> needed anyway. And that is exactly what my patchset is aiming to
> introduce. Auto-provision can be added when/if needed later on.

I still don't actually get this use case. Why would i want to manually
provision?

	Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-27 14:14                           ` Andrew Lunn
@ 2021-01-27 14:57                             ` David Ahern
  2021-01-28  8:14                             ` Jiri Pirko
  1 sibling, 0 replies; 80+ messages in thread
From: David Ahern @ 2021-01-27 14:57 UTC (permalink / raw)
  To: Andrew Lunn, Jiri Pirko
  Cc: Jakub Kicinski, netdev, davem, jacob.e.keller, roopa, mlxsw, vadimp

On 1/27/21 7:14 AM, Andrew Lunn wrote:
>> I don't think it would apply. The thing is, i2c driver has a channel to
>> the linecard eeprom, from where it can read info about the linecard. The
>> i2c driver also knows when the linecard is plugged in, unlike mlxsw.
>> It acts as a standalone driver. Mlxsw has no way to directly find if the
>> card was plugged in (unpowered) and which type it is.
>>
>> Not sure how to "embed" it. I don't think any existing API could help.
>> Basicall mlxsw would have to register a callback to the i2c driver
>> called every time card is inserted to do auto-provision.
>> Now consider a case when there are multiple instances of the ASIC on the
>> system. How to assemble a relationship between mlxsw instance and i2c
>> driver instance?
> 
> You have that knowledge already, otherwise you cannot solve this
> problem at all. The switch is an PCIe device right? So when the bus is
> enumerated, the driver loads. How do you bind the i2c driver to the
> i2c bus? You cannot enumerate i2c, so you must have some hard coded
> knowledge somewhere? You just need to get that knowledge into the
> mlxsw driver so it can bind its internal i2c client driver to the i2c
> bus. That way you avoid user space, i guess maybe udev rules, or some
> daemon monitoring propriety /sys files?
> 
>> But again, auto-provision is only one usecase. Manual provisioning is
>> needed anyway. And that is exactly what my patchset is aiming to
>> introduce. Auto-provision can be added when/if needed later on.
> 
> I still don't actually get this use case. Why would i want to manually
> provision?
> 


+1.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-27 14:14                           ` Andrew Lunn
  2021-01-27 14:57                             ` David Ahern
@ 2021-01-28  8:14                             ` Jiri Pirko
  2021-01-28 14:17                               ` Andrew Lunn
  1 sibling, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-28  8:14 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller,
	roopa, mlxsw, vadimp

Wed, Jan 27, 2021 at 03:14:34PM CET, andrew@lunn.ch wrote:
>> >There are Linux standard APIs for controlling the power to devices,
>> >the regulator API. So i assume mlxreg-pm will make use of that. There
>> >are also standard APIs for thermal management, which again, mlxreg-pm
>> >should be using. The regulator API allows you to find regulators by
>> >name. So just define a sensible naming convention, and the switch
>> >driver can lookup the regulator, and turn it on/off as needed.
>> 
>> 
>> I don't think it would apply. The thing is, i2c driver has a channel to
>> the linecard eeprom, from where it can read info about the linecard. The
>> i2c driver also knows when the linecard is plugged in, unlike mlxsw.
>> It acts as a standalone driver. Mlxsw has no way to directly find if the
>> card was plugged in (unpowered) and which type it is.
>> 
>> Not sure how to "embed" it. I don't think any existing API could help.
>> Basicall mlxsw would have to register a callback to the i2c driver
>> called every time card is inserted to do auto-provision.
>> Now consider a case when there are multiple instances of the ASIC on the
>> system. How to assemble a relationship between mlxsw instance and i2c
>> driver instance?
>
>You have that knowledge already, otherwise you cannot solve this

No I don't have it. I'm not sure why do you say so. The mlxsw and i2c
driver act independently.


>problem at all. The switch is an PCIe device right? So when the bus is
>enumerated, the driver loads. How do you bind the i2c driver to the
>i2c bus? You cannot enumerate i2c, so you must have some hard coded
>knowledge somewhere? You just need to get that knowledge into the
>mlxsw driver so it can bind its internal i2c client driver to the i2c

There is no internal i2c client driver for this.


>bus. That way you avoid user space, i guess maybe udev rules, or some
>daemon monitoring propriety /sys files?
>
>> But again, auto-provision is only one usecase. Manual provisioning is
>> needed anyway. And that is exactly what my patchset is aiming to
>> introduce. Auto-provision can be added when/if needed later on.
>
>I still don't actually get this use case. Why would i want to manually
>provision?

Because user might want to see the system with all netdevices, configure
them, change the linecard if they got broken and all config, like
bridge, tc, etc will stay on the netdevices. Again, this is the same we
do for split port. This is important requirement, user don't want to see
netdevices come and go when he is plugging/unplugging cables. Linecards
are the same in this matter. Basically is is a "splitter module",
replacing the "splitter cable"


>
>	Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-28  8:14                             ` Jiri Pirko
@ 2021-01-28 14:17                               ` Andrew Lunn
  2021-01-29  7:20                                 ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Lunn @ 2021-01-28 14:17 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller,
	roopa, mlxsw, vadimp

On Thu, Jan 28, 2021 at 09:14:34AM +0100, Jiri Pirko wrote:
> Wed, Jan 27, 2021 at 03:14:34PM CET, andrew@lunn.ch wrote:
> >> >There are Linux standard APIs for controlling the power to devices,
> >> >the regulator API. So i assume mlxreg-pm will make use of that. There
> >> >are also standard APIs for thermal management, which again, mlxreg-pm
> >> >should be using. The regulator API allows you to find regulators by
> >> >name. So just define a sensible naming convention, and the switch
> >> >driver can lookup the regulator, and turn it on/off as needed.
> >> 
> >> 
> >> I don't think it would apply. The thing is, i2c driver has a channel to
> >> the linecard eeprom, from where it can read info about the linecard. The
> >> i2c driver also knows when the linecard is plugged in, unlike mlxsw.
> >> It acts as a standalone driver. Mlxsw has no way to directly find if the
> >> card was plugged in (unpowered) and which type it is.
> >> 
> >> Not sure how to "embed" it. I don't think any existing API could help.
> >> Basicall mlxsw would have to register a callback to the i2c driver
> >> called every time card is inserted to do auto-provision.
> >> Now consider a case when there are multiple instances of the ASIC on the
> >> system. How to assemble a relationship between mlxsw instance and i2c
> >> driver instance?
> >
> >You have that knowledge already, otherwise you cannot solve this
> 
> No I don't have it. I'm not sure why do you say so. The mlxsw and i2c
> driver act independently.

Ah, so you just export some information in /sys from the i2c driver?
And you expect the poor user to look at the values, and copy paste
them to the correct mlxsw instance? 50/50 guess if you have two
switches, and hope they don't make a typO?

> >I still don't actually get this use case. Why would i want to manually
> >provision?
> 
> Because user might want to see the system with all netdevices, configure
> them, change the linecard if they got broken and all config, like
> bridge, tc, etc will stay on the netdevices. Again, this is the same we
> do for split port. This is important requirement, user don't want to see
> netdevices come and go when he is plugging/unplugging cables. Linecards
> are the same in this matter. Basically is is a "splitter module",
> replacing the "splitter cable"

So, what is the real use case here? Why might the user want to do
this?

Is it: The magic smoke has escaped. The user takes a spare switch, and
wants to put it on her desk to configure it where she has a comfy chair
and piece and quiet, unlike in the data centre, which is very noise,
only has hard plastic chair, no coffee allowed. She makes her best
guess at the configuration, up/downs the interfaces, reboots, to make
sure it is permanent, and only then moves to the data centre to swap
the dead router for the new one, and fix up whatever configuration
errors there are, while sat on the hard chair?

So this feature is about comfy chair vs hard chair?

I'm also wondering about the splitter port use case. At what point do
you tell the user that it is physically impossible to split the port
because the SFP simply does not support it? You say the netdevs don't
come/go. I assume the link never goes up, but how does the user know
the configuration is FUBAR, not the SFP? To me, it seems a lot more
intuitive that when i remove an SFP which has been split into 4, and
pop in an SFP which only supports a single stream, the 3 extra netdevs
would just vanish.

   Andrew


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-28 14:17                               ` Andrew Lunn
@ 2021-01-29  7:20                                 ` Jiri Pirko
       [not found]                                   ` <YBQujIdnFtEhWqTF@lunn.ch>
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-29  7:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller,
	roopa, mlxsw, vadimp

Thu, Jan 28, 2021 at 03:17:13PM CET, andrew@lunn.ch wrote:
>On Thu, Jan 28, 2021 at 09:14:34AM +0100, Jiri Pirko wrote:
>> Wed, Jan 27, 2021 at 03:14:34PM CET, andrew@lunn.ch wrote:
>> >> >There are Linux standard APIs for controlling the power to devices,
>> >> >the regulator API. So i assume mlxreg-pm will make use of that. There
>> >> >are also standard APIs for thermal management, which again, mlxreg-pm
>> >> >should be using. The regulator API allows you to find regulators by
>> >> >name. So just define a sensible naming convention, and the switch
>> >> >driver can lookup the regulator, and turn it on/off as needed.
>> >> 
>> >> 
>> >> I don't think it would apply. The thing is, i2c driver has a channel to
>> >> the linecard eeprom, from where it can read info about the linecard. The
>> >> i2c driver also knows when the linecard is plugged in, unlike mlxsw.
>> >> It acts as a standalone driver. Mlxsw has no way to directly find if the
>> >> card was plugged in (unpowered) and which type it is.
>> >> 
>> >> Not sure how to "embed" it. I don't think any existing API could help.
>> >> Basicall mlxsw would have to register a callback to the i2c driver
>> >> called every time card is inserted to do auto-provision.
>> >> Now consider a case when there are multiple instances of the ASIC on the
>> >> system. How to assemble a relationship between mlxsw instance and i2c
>> >> driver instance?
>> >
>> >You have that knowledge already, otherwise you cannot solve this
>> 
>> No I don't have it. I'm not sure why do you say so. The mlxsw and i2c
>> driver act independently.
>
>Ah, so you just export some information in /sys from the i2c driver?
>And you expect the poor user to look at the values, and copy paste
>them to the correct mlxsw instance? 50/50 guess if you have two
>switches, and hope they don't make a typO?

Which values are you talking about here exactly?


>
>> >I still don't actually get this use case. Why would i want to manually
>> >provision?
>> 
>> Because user might want to see the system with all netdevices, configure
>> them, change the linecard if they got broken and all config, like
>> bridge, tc, etc will stay on the netdevices. Again, this is the same we
>> do for split port. This is important requirement, user don't want to see
>> netdevices come and go when he is plugging/unplugging cables. Linecards
>> are the same in this matter. Basically is is a "splitter module",
>> replacing the "splitter cable"
>
>So, what is the real use case here? Why might the user want to do
>this?
>
>Is it: The magic smoke has escaped. The user takes a spare switch, and
>wants to put it on her desk to configure it where she has a comfy chair
>and piece and quiet, unlike in the data centre, which is very noise,
>only has hard plastic chair, no coffee allowed. She makes her best
>guess at the configuration, up/downs the interfaces, reboots, to make
>sure it is permanent, and only then moves to the data centre to swap
>the dead router for the new one, and fix up whatever configuration
>errors there are, while sat on the hard chair?
>
>So this feature is about comfy chair vs hard chair?

I don't really get the question, but configuring switch w/o any linecard
and plug the linecards in later on is definitelly a usecase.


>
>I'm also wondering about the splitter port use case. At what point do
>you tell the user that it is physically impossible to split the port
>because the SFP simply does not support it? You say the netdevs don't
>come/go. I assume the link never goes up, but how does the user know
>the configuration is FUBAR, not the SFP? To me, it seems a lot more
>intuitive that when i remove an SFP which has been split into 4, and
>pop in an SFP which only supports a single stream, the 3 extra netdevs
>would just vanish.

As I wrote easlier in this thread, for hw that supports it, there should
be possibility to turn on "autosplit" mode that would do exactly what
you describe. But depends on a usecase. User should be in power to
configure "autosplit" for split cables and "autodetect" for linecards.
Both should be treated in the same way I believe.


>
>   Andrew
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [patch net-next RFC 00/10] introduce line card support for modular switch
       [not found]                                   ` <YBQujIdnFtEhWqTF@lunn.ch>
@ 2021-01-29 16:45                                     ` Vadim Pasternak
  2021-01-29 17:31                                       ` Andrew Lunn
  2021-02-01  1:43                                       ` Andrew Lunn
  0 siblings, 2 replies; 80+ messages in thread
From: Vadim Pasternak @ 2021-01-29 16:45 UTC (permalink / raw)
  To: Andrew Lunn, Jiri Pirko
  Cc: David Ahern, Jakub Kicinski, netdev, davem, jacob.e.keller,
	Roopa Prabhu, mlxsw



> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Friday, January 29, 2021 5:50 PM
> To: Jiri Pirko <jiri@resnulli.us>
> Cc: David Ahern <dsahern@gmail.com>; Jakub Kicinski <kuba@kernel.org>;
> netdev@vger.kernel.org; davem@davemloft.net; jacob.e.keller@intel.com;
> Roopa Prabhu <roopa@nvidia.com>; mlxsw <mlxsw@nvidia.com>; Vadim
> Pasternak <vadimp@nvidia.com>
> Subject: Re: [patch net-next RFC 00/10] introduce line card support for
> modular switch
> 
> On Fri, Jan 29, 2021 at 08:20:15AM +0100, Jiri Pirko wrote:
> > Thu, Jan 28, 2021 at 03:17:13PM CET, andrew@lunn.ch wrote:
> > >On Thu, Jan 28, 2021 at 09:14:34AM +0100, Jiri Pirko wrote:
> > >> Wed, Jan 27, 2021 at 03:14:34PM CET, andrew@lunn.ch wrote:
> > >> >> >There are Linux standard APIs for controlling the power to
> > >> >> >devices, the regulator API. So i assume mlxreg-pm will make use
> > >> >> >of that. There are also standard APIs for thermal management,
> > >> >> >which again, mlxreg-pm should be using. The regulator API
> > >> >> >allows you to find regulators by name. So just define a
> > >> >> >sensible naming convention, and the switch driver can lookup the
> regulator, and turn it on/off as needed.
> > >> >>
> > >> >>
> > >> >> I don't think it would apply. The thing is, i2c driver has a
> > >> >> channel to the linecard eeprom, from where it can read info
> > >> >> about the linecard. The i2c driver also knows when the linecard is
> plugged in, unlike mlxsw.
> > >> >> It acts as a standalone driver. Mlxsw has no way to directly
> > >> >> find if the card was plugged in (unpowered) and which type it is.
> > >> >>
> > >> >> Not sure how to "embed" it. I don't think any existing API could help.
> > >> >> Basicall mlxsw would have to register a callback to the i2c
> > >> >> driver called every time card is inserted to do auto-provision.
> > >> >> Now consider a case when there are multiple instances of the
> > >> >> ASIC on the system. How to assemble a relationship between mlxsw
> > >> >> instance and i2c driver instance?
> > >> >
> > >> >You have that knowledge already, otherwise you cannot solve this
> > >>
> > >> No I don't have it. I'm not sure why do you say so. The mlxsw and
> > >> i2c driver act independently.
> > >
> > >Ah, so you just export some information in /sys from the i2c driver?
> > >And you expect the poor user to look at the values, and copy paste
> > >them to the correct mlxsw instance? 50/50 guess if you have two
> > >switches, and hope they don't make a typO?
> >
> > Which values are you talking about here exactly?
> 
> The i2c driver tells you what line card is actually inserted.
> Hopefully it interprets the EEPROM and gives the user a nice string. You then
> need to use this string to provision the switch, so it knows what line card has
> been inserted. Or the user can pre-prevision, make a guess as to what card will
> actually be inserted sometime in the future, tell the switch, and hope that
> actually happens.

Hi Andrew,

mlxsw I2C driver is BMC side driver. Its purpose to provide hwmon,
thermal, QSFP info for the chassis management at BMC side.
It works on top of PRM interface and it is associated with the chip I2C
slave device.
It doesn't aware of system topology, it knows nothing about system I2C
tree, what is EEPROM, where it located and so on. This is not a scope of
this driver.

Platform line card driver is aware of line card I2C topology, its
responsibility is to detect line card basic hardware type, create I2C
topology (mux), connect all the necessary I2C devices, like hotswap
devices, voltage and power regulators devices, iio/a2d devices and line
card EEPROMs, creates LED instances for LED located on a line card, exposes
line card related attributes, like CPLD and FPGA versions, reset causes,
required powered through line card hwmon interface.

> 
> > >> >I still don't actually get this use case. Why would i want to
> > >> >manually provision?
> > >>
> > >> Because user might want to see the system with all netdevices,
> > >> configure them, change the linecard if they got broken and all
> > >> config, like bridge, tc, etc will stay on the netdevices. Again,
> > >> this is the same we do for split port. This is important
> > >> requirement, user don't want to see netdevices come and go when he
> > >> is plugging/unplugging cables. Linecards are the same in this
> > >> matter. Basically is is a "splitter module", replacing the "splitter cable"
> > >
> > >So, what is the real use case here? Why might the user want to do
> > >this?
> > >
> > >Is it: The magic smoke has escaped. The user takes a spare switch,
> > >and wants to put it on her desk to configure it where she has a comfy
> > >chair and piece and quiet, unlike in the data centre, which is very
> > >noise, only has hard plastic chair, no coffee allowed. She makes her
> > >best guess at the configuration, up/downs the interfaces, reboots, to
> > >make sure it is permanent, and only then moves to the data centre to
> > >swap the dead router for the new one, and fix up whatever
> > >configuration errors there are, while sat on the hard chair?
> > >
> > >So this feature is about comfy chair vs hard chair?
> >
> > I don't really get the question, but configuring switch w/o any
> > linecard and plug the linecards in later on is definitelly a usecase.
> 
> It is a requirement, not a use case.
> 
> A use case is the big picture, what is the user doing, at the big picture level. In
> the somewhat absurd example given above, the user case is, the router chassis
> has died, but they think the line cards are O.K. They want to do as much
> configuration and testing as possible before going into the data center to
> actually replace the chassis.  By reusing the existing line cards, they reduce the
> risk of getting the cables plugged into the wrong port.
> 
> From the use case, you can derive the requirements. In order to test that ifup --
> all puts the IP addresses in the right places, it needs to have the netdevs with
> the correct names. Either they need line cards in the router, or they need to be
> able to fake line cards. There is also a requirement that the line cards are easy
> to exchange. They do not need to be fully hot-plugable, since one router is
> dead, the other can be powered off. But ideally you want simple thumb
> screws, not a Philips screwdriver or an allan key. There is also a requirement
> that this provision is persistent, since the user is likely to reboot the system in
> order to test the configurations files actually work at boot time. Either the
> switch driver needs to write the information to FLASH, or user space needs to
> tell it on every boot, a systemd service file or similar.
> 
> But since you have not been able to answer my question, i wonder if
> everything is backwards around here. Your architecture is broken, you cannot
> easily determine what line card is inserted, so you need a workaround,
> provisioning. But provision might actually be a useful feature, so lets try to sell
> the feature, and gloss over that the architecture is broken.
> 
> So, i would like to see the architecture fixed first. The switch driver somehow
> talks to the i2c driver to find out what card is in the slot, and configures itself.
> My guess is, every other switch does this, this is what the user expects as a
> base feature, it is what we want Linux to do by default.
> 
> You can later add provisioning, where if the slot is empty, you can fake a line
> card, to fulfil the use cases. And when the slot is actually filled, you can verify
> what is plugged in matches what was expected, and be very noise if not.
> 
> 	  Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-29 16:45                                     ` Vadim Pasternak
@ 2021-01-29 17:31                                       ` Andrew Lunn
  2021-01-30 14:19                                         ` Jiri Pirko
  2021-02-01  1:43                                       ` Andrew Lunn
  1 sibling, 1 reply; 80+ messages in thread
From: Andrew Lunn @ 2021-01-29 17:31 UTC (permalink / raw)
  To: Vadim Pasternak
  Cc: Jiri Pirko, David Ahern, Jakub Kicinski, netdev, davem,
	jacob.e.keller, Roopa Prabhu, mlxsw

> Platform line card driver is aware of line card I2C topology, its
> responsibility is to detect line card basic hardware type, create I2C
> topology (mux), connect all the necessary I2C devices, like hotswap
> devices, voltage and power regulators devices, iio/a2d devices and line
> card EEPROMs, creates LED instances for LED located on a line card, exposes
> line card related attributes, like CPLD and FPGA versions, reset causes,
> required powered through line card hwmon interface.

So this driver, and the switch driver need to talk to each other, so
the switch driver actually knows what, if anything, is in the slot.

    Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-29 17:31                                       ` Andrew Lunn
@ 2021-01-30 14:19                                         ` Jiri Pirko
       [not found]                                           ` <251d1e12-1d61-0922-31f8-a8313f18f194@gmail.com>
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-01-30 14:19 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Vadim Pasternak, David Ahern, Jakub Kicinski, netdev, davem,
	jacob.e.keller, Roopa Prabhu, mlxsw

Fri, Jan 29, 2021 at 06:31:59PM CET, andrew@lunn.ch wrote:
>> Platform line card driver is aware of line card I2C topology, its
>> responsibility is to detect line card basic hardware type, create I2C
>> topology (mux), connect all the necessary I2C devices, like hotswap
>> devices, voltage and power regulators devices, iio/a2d devices and line
>> card EEPROMs, creates LED instances for LED located on a line card, exposes
>> line card related attributes, like CPLD and FPGA versions, reset causes,
>> required powered through line card hwmon interface.
>
>So this driver, and the switch driver need to talk to each other, so
>the switch driver actually knows what, if anything, is in the slot.

Not possible in case the BMC is a different host, which is common
scenario.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-01-29 16:45                                     ` Vadim Pasternak
  2021-01-29 17:31                                       ` Andrew Lunn
@ 2021-02-01  1:43                                       ` Andrew Lunn
  1 sibling, 0 replies; 80+ messages in thread
From: Andrew Lunn @ 2021-02-01  1:43 UTC (permalink / raw)
  To: Vadim Pasternak
  Cc: Jiri Pirko, David Ahern, Jakub Kicinski, netdev, davem,
	jacob.e.keller, Roopa Prabhu, mlxsw

> Platform line card driver is aware of line card I2C topology, its
> responsibility is to detect line card basic hardware type, create I2C
> topology (mux), connect all the necessary I2C devices, like hotswap
> devices, voltage and power regulators devices, iio/a2d devices and line
> card EEPROMs, creates LED instances for LED located on a line card, exposes
> line card related attributes, like CPLD and FPGA versions, reset causes,
> required powered through line card hwmon interface.

Jiri says the hardware is often connected to the BMC. But you do
expose much of this to the host as well? You want devlink dev info to
show the version information. Use devlink dev flash to upgrade the
bitfile in the CPD and FPGA. The hwmon instances are pretty pointless
on the BMC where nobody can see them. Are there temperature sensors
involved? The host is where the thermal policy is running, deciding
what to throttle, or shut down when it gets too hot. LEDs can be
controlled via /sys/class/led as expected?

So exporting what the line card actually is to the host is not really
a problem, it is just one more bit of information amongst everything
else already exposed to it.

	Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
       [not found]                                           ` <251d1e12-1d61-0922-31f8-a8313f18f194@gmail.com>
@ 2021-02-01  8:16                                             ` Jiri Pirko
  2021-02-01 13:41                                               ` Andrew Lunn
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-02-01  8:16 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrew Lunn, Vadim Pasternak, Jakub Kicinski, netdev, davem,
	jacob.e.keller, Roopa Prabhu, mlxsw

Sun, Jan 31, 2021 at 06:09:24PM CET, dsahern@gmail.com wrote:
>On 1/30/21 7:19 AM, Jiri Pirko wrote:
>> Fri, Jan 29, 2021 at 06:31:59PM CET, andrew@lunn.ch wrote:
>>>> Platform line card driver is aware of line card I2C topology, its
>>>> responsibility is to detect line card basic hardware type, create I2C
>>>> topology (mux), connect all the necessary I2C devices, like hotswap
>>>> devices, voltage and power regulators devices, iio/a2d devices and line
>>>> card EEPROMs, creates LED instances for LED located on a line card, exposes
>>>> line card related attributes, like CPLD and FPGA versions, reset causes,
>>>> required powered through line card hwmon interface.
>>>
>>> So this driver, and the switch driver need to talk to each other, so
>>> the switch driver actually knows what, if anything, is in the slot.
>> 
>> Not possible in case the BMC is a different host, which is common
>> scenario.
>> 
>
>User provisions a 4 port card, but a 2 port card is inserted. How is
>this detected and the user told the wrong card is inserted?

The card won't get activated.
The user won't see the type of inserted linecard. Again, it is not
possible for ASIC to access the linecard eeprom. See Vadim's reply.


>
>If it is not detected that's a serious problem, no?

That is how it is, unfortunatelly.


>
>If it is detected why can't the same mechanism be used for auto
>provisioning?

Again, not possible to detect.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-02-01  8:16                                             ` Jiri Pirko
@ 2021-02-01 13:41                                               ` Andrew Lunn
  2021-02-03 14:57                                                 ` Jiri Pirko
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Lunn @ 2021-02-01 13:41 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, Vadim Pasternak, Jakub Kicinski, netdev, davem,
	jacob.e.keller, Roopa Prabhu, mlxsw

On Mon, Feb 01, 2021 at 09:16:41AM +0100, Jiri Pirko wrote:
> Sun, Jan 31, 2021 at 06:09:24PM CET, dsahern@gmail.com wrote:
> >On 1/30/21 7:19 AM, Jiri Pirko wrote:
> >> Fri, Jan 29, 2021 at 06:31:59PM CET, andrew@lunn.ch wrote:
> >>>> Platform line card driver is aware of line card I2C topology, its
> >>>> responsibility is to detect line card basic hardware type, create I2C
> >>>> topology (mux), connect all the necessary I2C devices, like hotswap
> >>>> devices, voltage and power regulators devices, iio/a2d devices and line
> >>>> card EEPROMs, creates LED instances for LED located on a line card, exposes
> >>>> line card related attributes, like CPLD and FPGA versions, reset causes,
> >>>> required powered through line card hwmon interface.
> >>>
> >>> So this driver, and the switch driver need to talk to each other, so
> >>> the switch driver actually knows what, if anything, is in the slot.
> >> 
> >> Not possible in case the BMC is a different host, which is common
> >> scenario.
> >> 
> >
> >User provisions a 4 port card, but a 2 port card is inserted. How is
> >this detected and the user told the wrong card is inserted?
> 
> The card won't get activated.
> The user won't see the type of inserted linecard. Again, it is not
> possible for ASIC to access the linecard eeprom. See Vadim's reply.
> 
> 
> >
> >If it is not detected that's a serious problem, no?
> 
> That is how it is, unfortunatelly.
> 
> 
> >
> >If it is detected why can't the same mechanism be used for auto
> >provisioning?
> 
> Again, not possible to detect.

If the platform line card driver is running in the host, you can
detect it. From your wording, it sounds like some systems do have this
driver in the host. So please add the needed code.

When the platform line card driver is on the BMC, you need a proxy in
between. Isn't this what IPMI and Redfish is all about? The proxy
driver can offer the same interface as the platform line card driver.

    Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-02-01 13:41                                               ` Andrew Lunn
@ 2021-02-03 14:57                                                 ` Jiri Pirko
  2021-02-03 16:26                                                   ` Andrew Lunn
  0 siblings, 1 reply; 80+ messages in thread
From: Jiri Pirko @ 2021-02-03 14:57 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Ahern, Vadim Pasternak, Jakub Kicinski, netdev, davem,
	jacob.e.keller, Roopa Prabhu, mlxsw

Mon, Feb 01, 2021 at 02:41:07PM CET, andrew@lunn.ch wrote:
>On Mon, Feb 01, 2021 at 09:16:41AM +0100, Jiri Pirko wrote:
>> Sun, Jan 31, 2021 at 06:09:24PM CET, dsahern@gmail.com wrote:
>> >On 1/30/21 7:19 AM, Jiri Pirko wrote:
>> >> Fri, Jan 29, 2021 at 06:31:59PM CET, andrew@lunn.ch wrote:
>> >>>> Platform line card driver is aware of line card I2C topology, its
>> >>>> responsibility is to detect line card basic hardware type, create I2C
>> >>>> topology (mux), connect all the necessary I2C devices, like hotswap
>> >>>> devices, voltage and power regulators devices, iio/a2d devices and line
>> >>>> card EEPROMs, creates LED instances for LED located on a line card, exposes
>> >>>> line card related attributes, like CPLD and FPGA versions, reset causes,
>> >>>> required powered through line card hwmon interface.
>> >>>
>> >>> So this driver, and the switch driver need to talk to each other, so
>> >>> the switch driver actually knows what, if anything, is in the slot.
>> >> 
>> >> Not possible in case the BMC is a different host, which is common
>> >> scenario.
>> >> 
>> >
>> >User provisions a 4 port card, but a 2 port card is inserted. How is
>> >this detected and the user told the wrong card is inserted?
>> 
>> The card won't get activated.
>> The user won't see the type of inserted linecard. Again, it is not
>> possible for ASIC to access the linecard eeprom. See Vadim's reply.
>> 
>> 
>> >
>> >If it is not detected that's a serious problem, no?
>> 
>> That is how it is, unfortunatelly.
>> 
>> 
>> >
>> >If it is detected why can't the same mechanism be used for auto
>> >provisioning?
>> 
>> Again, not possible to detect.
>
>If the platform line card driver is running in the host, you can
>detect it. From your wording, it sounds like some systems do have this
>driver in the host. So please add the needed code.

But if not, it cannot. We still need the provisioning then.


>
>When the platform line card driver is on the BMC, you need a proxy in
>between. Isn't this what IPMI and Redfish is all about? The proxy
>driver can offer the same interface as the platform line card driver.

Do you have any example of kernel driver which is doing some thing like
that?


>
>    Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [patch net-next RFC 00/10] introduce line card support for modular switch
  2021-02-03 14:57                                                 ` Jiri Pirko
@ 2021-02-03 16:26                                                   ` Andrew Lunn
  0 siblings, 0 replies; 80+ messages in thread
From: Andrew Lunn @ 2021-02-03 16:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, Vadim Pasternak, Jakub Kicinski, netdev, davem,
	jacob.e.keller, Roopa Prabhu, mlxsw

> >When the platform line card driver is on the BMC, you need a proxy in
> >between. Isn't this what IPMI and Redfish is all about? The proxy
> >driver can offer the same interface as the platform line card driver.
> 
> Do you have any example of kernel driver which is doing some thing like
> that?

drivers/hwmon/ibmaem.c is a pretty normal looking HWMON driver, for
temperature/power/energy sensors which are connected to the BMC and
accessed over IPMI.

char/ipmi/ipmi_watchdog.c as the name suggests is a watchdog. At first
glance its API to user space follows the standard API, even if it does
not make use of the watchdog subsystem core.

These two should give you examples of how you talk to the BMC from a
kernel driver.

	 Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2021-02-03 16:27 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-13 12:12 [patch net-next RFC 00/10] introduce line card support for modular switch Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 01/10] devlink: add support to create line card and expose to user Jiri Pirko
2021-01-15 15:47   ` Ido Schimmel
2021-01-13 12:12 ` [patch net-next RFC 02/10] devlink: implement line card provisioning Jiri Pirko
2021-01-15 16:03   ` Ido Schimmel
2021-01-15 16:51     ` Jiri Pirko
2021-01-15 18:09       ` Ido Schimmel
2021-01-18 12:50         ` Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 03/10] devlink: implement line card active state Jiri Pirko
2021-01-15 16:06   ` Ido Schimmel
2021-01-15 16:52     ` Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 04/10] devlink: append split port number to the port name Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 05/10] devlink: add port to line card relationship set Jiri Pirko
2021-01-15 16:10   ` Ido Schimmel
2021-01-15 16:53     ` Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 06/10] netdevsim: introduce line card support Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 07/10] netdevsim: allow port objects to be linked with line cards Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 08/10] netdevsim: create devlink line card object and implement provisioning Jiri Pirko
2021-01-15 16:30   ` Ido Schimmel
2021-01-15 16:54     ` Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 09/10] netdevsim: implement line card activation Jiri Pirko
2021-01-13 12:12 ` [patch net-next RFC 10/10] selftests: add netdevsim devlink lc test Jiri Pirko
2021-01-13 13:39 ` [patch iproute2/net-next RFC] devlink: add support for linecard show and provision Jiri Pirko
2021-01-14  2:07 ` [patch net-next RFC 00/10] introduce line card support for modular switch Andrew Lunn
2021-01-14  7:39   ` Jiri Pirko
2021-01-14 22:56     ` Jacob Keller
2021-01-15 14:19       ` Jiri Pirko
2021-01-19 11:56   ` Jiri Pirko
2021-01-19 14:51     ` Andrew Lunn
2021-01-20  8:36       ` Jiri Pirko
2021-01-20 13:56         ` Andrew Lunn
2021-01-20 23:41           ` Jakub Kicinski
2021-01-21  0:01             ` Andrew Lunn
2021-01-21  0:16               ` Jakub Kicinski
2021-01-21 15:34               ` Jiri Pirko
2021-01-21 15:32             ` Jiri Pirko
2021-01-21 16:38               ` David Ahern
2021-01-22  7:28                 ` Jiri Pirko
2021-01-22 14:13                   ` Andrew Lunn
2021-01-26 11:33                     ` Jiri Pirko
2021-01-26 13:56                       ` Andrew Lunn
2021-01-27  7:57                         ` Jiri Pirko
2021-01-27 14:14                           ` Andrew Lunn
2021-01-27 14:57                             ` David Ahern
2021-01-28  8:14                             ` Jiri Pirko
2021-01-28 14:17                               ` Andrew Lunn
2021-01-29  7:20                                 ` Jiri Pirko
     [not found]                                   ` <YBQujIdnFtEhWqTF@lunn.ch>
2021-01-29 16:45                                     ` Vadim Pasternak
2021-01-29 17:31                                       ` Andrew Lunn
2021-01-30 14:19                                         ` Jiri Pirko
     [not found]                                           ` <251d1e12-1d61-0922-31f8-a8313f18f194@gmail.com>
2021-02-01  8:16                                             ` Jiri Pirko
2021-02-01 13:41                                               ` Andrew Lunn
2021-02-03 14:57                                                 ` Jiri Pirko
2021-02-03 16:26                                                   ` Andrew Lunn
2021-02-01  1:43                                       ` Andrew Lunn
2021-01-22  8:05                 ` Jiri Pirko
2021-01-19 16:23     ` David Ahern
2021-01-20  8:37       ` Jiri Pirko
2021-01-14  2:27 ` Jakub Kicinski
2021-01-14  7:48   ` Jiri Pirko
2021-01-14 23:30     ` Jakub Kicinski
2021-01-15 14:39       ` Jiri Pirko
2021-01-15 19:26         ` Jakub Kicinski
2021-01-18 13:00           ` Jiri Pirko
2021-01-18 17:59             ` Jakub Kicinski
2021-01-19 11:51               ` Jiri Pirko
2021-01-18 22:55             ` David Ahern
2021-01-22  8:01               ` Jiri Pirko
2021-01-14 22:58   ` Jacob Keller
2021-01-14 23:20     ` Jakub Kicinski
2021-01-15 14:40       ` Jiri Pirko
2021-01-15 15:43 ` Ido Schimmel
2021-01-15 16:55   ` Jiri Pirko
2021-01-15 18:01     ` Ido Schimmel
2021-01-18 13:03       ` Jiri Pirko
2021-01-18 18:01 ` Edwin Peer
2021-01-18 22:57   ` David Ahern
2021-01-18 23:40     ` Edwin Peer
2021-01-19  2:39       ` David Ahern
2021-01-19  5:06         ` Edwin Peer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).