netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] devlink show controller info
@ 2020-08-25 13:58 Parav Pandit
  2020-08-25 13:58 ` [PATCH net-next 1/3] devlink: Add comment block for missing port attributes Parav Pandit
                   ` (4 more replies)
  0 siblings, 5 replies; 51+ messages in thread
From: Parav Pandit @ 2020-08-25 13:58 UTC (permalink / raw)
  To: davem, kuba, netdev; +Cc: roid, saeedm, Parav Pandit

Hi Dave, Jakub,

Currently a devlink instance that supports an eswitch handles eswitch
ports of two type of controllers.
(1) controller discovered on same system where eswitch resides.
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on other system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system. In this case
devlink instance of the eswitch only have access to ports of the
controller.
However, there is no way to describe that a eswitch devlink port
belongs to which controller (mainly which external host controller).
This problem is more prevalent when port attribute such as PF and VF
numbers are overlapping between multiple controllers of same eswitch.
Due to this, for a specific switch_id, unique phys_port_name cannot
be constructed for such devlink ports.

This short series overcomes this limitation by defining external
controller attribute of the devlink port.
A port that represents an external controller's port defines which
controller it belongs to. Based on this a unique phys_port_name
is prepared.

phys_port_name construction using unique controller number is only
applicable to external controller ports. This ensures that for
non smartnic usecases where there is no external controller,
phys_port_name stays same as before.

Patch summary:
Patch-1 Adds the missing comment for the port attributes
Patch-2 prepares devlink port with external controller attribute
Patch-3 mlx5 driver to use new external controller attribute API

Parav Pandit (3):
  devlink: Add comment block for missing port attributes
  devlink: Consider other controller while building phys_port_name
  net/mlx5: E-switch, Set controller attribute for PCI PF and VF ports

 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  5 ++++
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  1 +
 .../mellanox/mlx5/core/eswitch_offloads.c     | 21 ++++++++++++++
 include/net/devlink.h                         | 10 ++++++-
 include/uapi/linux/devlink.h                  |  1 +
 net/core/devlink.c                            | 29 +++++++++++++++++++
 6 files changed, 66 insertions(+), 1 deletion(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH net-next 1/3] devlink: Add comment block for missing port attributes
  2020-08-25 13:58 [PATCH net-next 0/3] devlink show controller info Parav Pandit
@ 2020-08-25 13:58 ` Parav Pandit
  2020-08-25 13:58 ` [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name Parav Pandit
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-08-25 13:58 UTC (permalink / raw)
  To: davem, kuba, netdev; +Cc: roid, saeedm, Parav Pandit, Jiri Pirko

Add comment block for physical, PF and VF port attributes.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
 include/net/devlink.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 8f3c8a443238..3c7ba3e1f490 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -73,6 +73,9 @@ struct devlink_port_pci_vf_attrs {
  * @splittable: indicates if the port can be split.
  * @lanes: maximum number of lanes the port supports. 0 value is not passed to netlink.
  * @switch_id: if the port is part of switch, this is buffer with ID, otherwise this is NULL
+ * @phys: physical port attributes
+ * @pci_pf: PCI PF port attributes
+ * @pci_vf: PCI VF port attributes
  */
 struct devlink_port_attrs {
 	u8 split:1,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-25 13:58 [PATCH net-next 0/3] devlink show controller info Parav Pandit
  2020-08-25 13:58 ` [PATCH net-next 1/3] devlink: Add comment block for missing port attributes Parav Pandit
@ 2020-08-25 13:58 ` Parav Pandit
  2020-08-26  0:32   ` Jakub Kicinski
  2020-08-25 13:58 ` [PATCH net-next 3/3] net/mlx5: E-switch, Set controller attribute for PCI PF and VF ports Parav Pandit
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-08-25 13:58 UTC (permalink / raw)
  To: davem, kuba, netdev; +Cc: roid, saeedm, Parav Pandit, Jiri Pirko

A devlink port may be for a controller consist of PCI device.
A devlink instance holds ports of two types of controllers.
(1) controller discovered on same system where eswitch resides
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on other system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system. In this case
devlink instance of the eswitch only have access to ports of the
controller.

When a devlink eswitch instance serves the devlink ports of both
controllers together, PCI PF/VF numbers may overlap.
Due to this a unique phys_port_name cannot be constructed.

To differentiate ports of a controller external/remote to the devlink
instance, consider external controller number while forming the
phys_port_name.
Also return this optional controller number of the port via netlink
interface.

An example output:

$ devlink port show pci/0000:00:08.0/2
pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0 vfnum 1 splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show -jp pci/0000:00:08.0/2
{
    "port": {
        "pci/0000:00:08.0/1": {
            "type": "eth",
            "netdev": "eth7",
            "controller": 0,
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

An example representor netdev udev name consist of controller
annotation for external controller with controller number = 0,
for PF 0 and VF 1:

udevadm test-builtin net_id /sys/class/net/eth7
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=enp0s8f0nc0pf0vf1
Unload module index
Unloaded link configuration context.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
 include/net/devlink.h        |  7 ++++++-
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 29 +++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3c7ba3e1f490..612f107b94ab 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -71,16 +71,20 @@ struct devlink_port_pci_vf_attrs {
  * @flavour: flavour of the port
  * @split: indicates if this is split port
  * @splittable: indicates if the port can be split.
+ * @controller_valid: indicates if the controller_num field is valid.
  * @lanes: maximum number of lanes the port supports. 0 value is not passed to netlink.
  * @switch_id: if the port is part of switch, this is buffer with ID, otherwise this is NULL
  * @phys: physical port attributes
  * @pci_pf: PCI PF port attributes
  * @pci_vf: PCI VF port attributes
+ * @controller_num: Controller number if a port is for other controller.
  */
 struct devlink_port_attrs {
 	u8 split:1,
-	   splittable:1;
+	   splittable:1,
+	   controller_valid:1;
 	u32 lanes;
+	u32 controller_num;
 	enum devlink_port_flavour flavour;
 	struct netdev_phys_item_id switch_id;
 	union {
@@ -1206,6 +1210,7 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 void devlink_port_type_clear(struct devlink_port *devlink_port);
 void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    struct devlink_port_attrs *devlink_port_attrs);
+void devlink_port_attrs_controller_set(struct devlink_port *devlink_port, u32 controller);
 void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
 				   u16 pf, u16 vf);
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cfef4245ea5a..886cddf6a0a9 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -458,6 +458,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_LANES,			/* u32 */
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
+	DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,		/* u32 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 58c8bb07fa19..b9b71f119446 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -553,6 +553,11 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 	default:
 		break;
 	}
+
+	if (attrs->controller_valid &&
+	    nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER, attrs->controller_num))
+		return -EMSGSIZE;
+
 	return 0;
 }
 
@@ -7711,6 +7716,22 @@ void devlink_port_attrs_set(struct devlink_port *devlink_port,
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
 
+/**
+ *	devlink_port_attrs_controller_set - Set external controller identifier
+ *
+ *	@devlink_port: devlink port
+ *	@controller: associated controller number for the devlink port instance
+ *	devlink_port_attrs_controller_set() Sets the external controller identifier
+ *	for the port. This should be called by the driver for a devlink port which is associated
+ *	with the controller which is external to the devlink instance.
+ */
+void devlink_port_attrs_controller_set(struct devlink_port *devlink_port, u32 controller)
+{
+	devlink_port->attrs.controller_valid = true;
+	devlink_port->attrs.controller_num = controller;
+}
+EXPORT_SYMBOL_GPL(devlink_port_attrs_controller_set);
+
 /**
  *	devlink_port_attrs_pci_pf_set - Set PCI PF port attributes
  *
@@ -7762,6 +7783,14 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 	if (!devlink_port->attrs_set)
 		return -EOPNOTSUPP;
 
+	if (attrs->controller_valid) {
+		n = snprintf(name, len, "c%u", attrs->controller_num);
+		if (n >= len)
+			return -EINVAL;
+		len -= n;
+		name += n;
+	}
+
 	switch (attrs->flavour) {
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 	case DEVLINK_PORT_FLAVOUR_VIRTUAL:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 3/3] net/mlx5: E-switch, Set controller attribute for PCI PF and VF ports
  2020-08-25 13:58 [PATCH net-next 0/3] devlink show controller info Parav Pandit
  2020-08-25 13:58 ` [PATCH net-next 1/3] devlink: Add comment block for missing port attributes Parav Pandit
  2020-08-25 13:58 ` [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name Parav Pandit
@ 2020-08-25 13:58 ` Parav Pandit
  2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
  4 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-08-25 13:58 UTC (permalink / raw)
  To: davem, kuba, netdev; +Cc: roid, saeedm, Parav Pandit, Jiri Pirko

ECPF supports one external host controller. When a port belongs to an external
host controller, setup external controller port attribute.

An example of a VF port of a ECPF supporting for an external controller:

$ devlink port show pci/0000:00:08.0/2
pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0 vfnum 1 splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show -jp pci/0000:00:08.0/2 {
    "port": {
        "pci/0000:00:08.0/2": {
            "type": "eth",
            "netdev": "eth7",
            "controller": 0,
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

An example representor netdev udev name consist of controller
annotation for external controller with controller number = 0,
for PF 0 and VF 1:

$ udevadm test-builtin net_id /sys/class/net/eth7
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=enp0s8f0nc0pf0vf1
Unload module index
Unloaded link configuration context.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  5 +++++
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  1 +
 .../mellanox/mlx5/core/eswitch_offloads.c     | 21 +++++++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index e13e5d1b3eae..9c79d9b84ebd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1210,6 +1210,7 @@ is_devlink_port_supported(const struct mlx5_core_dev *dev,
 static int register_devlink_port(struct mlx5_core_dev *dev,
 				 struct mlx5e_rep_priv *rpriv)
 {
+	struct mlx5_esw_offload *offloads = &dev->priv.eswitch->offloads;
 	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_eswitch_rep *rep = rpriv->rep;
 	struct devlink_port_attrs attrs = {};
@@ -1232,10 +1233,14 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 	} else if (rep->vport == MLX5_VPORT_PF) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
+		if (mlx5_core_is_ecpf_esw_manager(dev))
+			devlink_port_attrs_controller_set(&rpriv->dl_port, offloads->host_number);
 		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, pfnum);
 	} else if (mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport)) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
+		if (mlx5_core_is_ecpf_esw_manager(dev))
+			devlink_port_attrs_controller_set(&rpriv->dl_port, offloads->host_number);
 		devlink_port_attrs_pci_vf_set(&rpriv->dl_port,
 					      pfnum, rep->vport - 1);
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 867d8120b8a5..7455fbd21a0a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -217,6 +217,7 @@ struct mlx5_esw_offload {
 	atomic64_t num_flows;
 	enum devlink_eswitch_encap_mode encap;
 	struct ida vport_metadata_ida;
+	unsigned int host_number; /* ECPF supports one external host */
 };
 
 /* E-Switch MC FDB table hash node */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index d2516922d867..56b42ab66f3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2110,6 +2110,23 @@ int mlx5_esw_funcs_changed_handler(struct notifier_block *nb, unsigned long type
 	return NOTIFY_OK;
 }
 
+static int mlx5_esw_host_number_init(struct mlx5_eswitch *esw)
+{
+	const u32 *query_host_out;
+
+	if (!mlx5_core_is_ecpf_esw_manager(esw->dev))
+		return 0;
+
+	query_host_out = mlx5_esw_query_functions(esw->dev);
+	if (IS_ERR(query_host_out))
+		return PTR_ERR(query_host_out);
+
+	esw->offloads.host_number = MLX5_GET(query_esw_functions_out, query_host_out,
+					     host_params_context.host_number);
+	kvfree(query_host_out);
+	return 0;
+}
+
 int esw_offloads_enable(struct mlx5_eswitch *esw)
 {
 	struct mlx5_vport *vport;
@@ -2124,6 +2141,10 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
 	mutex_init(&esw->offloads.termtbl_mutex);
 	mlx5_rdma_enable_roce(esw->dev);
 
+	err = mlx5_esw_host_number_init(esw);
+	if (err)
+		goto err_vport_metadata;
+
 	err = esw_set_passing_vport_metadata(esw, true);
 	if (err)
 		goto err_vport_metadata;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-25 13:58 ` [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name Parav Pandit
@ 2020-08-26  0:32   ` Jakub Kicinski
  2020-08-26  4:27     ` Parav Pandit
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-08-26  0:32 UTC (permalink / raw)
  To: Parav Pandit; +Cc: davem, netdev, roid, saeedm, Jiri Pirko

On Tue, 25 Aug 2020 16:58:38 +0300 Parav Pandit wrote:
> A devlink port may be for a controller consist of PCI device.
> A devlink instance holds ports of two types of controllers.
> (1) controller discovered on same system where eswitch resides
> This is the case where PCI PF/VF of a controller and devlink eswitch
> instance both are located on a single system.
> (2) controller located on other system.
> This is the case where a controller is located in one system and its
> devlink eswitch ports are located in a different system. In this case
> devlink instance of the eswitch only have access to ports of the
> controller.
> 
> When a devlink eswitch instance serves the devlink ports of both
> controllers together, PCI PF/VF numbers may overlap.
> Due to this a unique phys_port_name cannot be constructed.

This description is clear as mud to me. Is it just me? Can someone
understand this?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-26  0:32   ` Jakub Kicinski
@ 2020-08-26  4:27     ` Parav Pandit
  2020-08-26 20:07       ` Jakub Kicinski
  0 siblings, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-08-26  4:27 UTC (permalink / raw)
  To: Jakub Kicinski, Parav Pandit; +Cc: davem, netdev, roid, saeedm, Jiri Pirko

Hi Jakub,

> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org>
> On Behalf Of Jakub Kicinski
> Sent: Wednesday, August 26, 2020 6:02 AM
> 
> On Tue, 25 Aug 2020 16:58:38 +0300 Parav Pandit wrote:
> > A devlink port may be for a controller consist of PCI device.
> > A devlink instance holds ports of two types of controllers.
> > (1) controller discovered on same system where eswitch resides This is
> > the case where PCI PF/VF of a controller and devlink eswitch instance
> > both are located on a single system.
> > (2) controller located on other system.
> > This is the case where a controller is located in one system and its
> > devlink eswitch ports are located in a different system. In this case
> > devlink instance of the eswitch only have access to ports of the
> > controller.
> >
> > When a devlink eswitch instance serves the devlink ports of both
> > controllers together, PCI PF/VF numbers may overlap.
> > Due to this a unique phys_port_name cannot be constructed.
> 
> This description is clear as mud to me. Is it just me? Can someone understand
> this?

I would like to improve this description.
Do you have an input to describe these two different controllers, each has same PF and VF numbers?

$ devlink port show looks like below without a controller annotation.
pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
                                                                                                   ^^^^^^^




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-26  4:27     ` Parav Pandit
@ 2020-08-26 20:07       ` Jakub Kicinski
  2020-08-27  4:31         ` Parav Pandit
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-08-26 20:07 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Wed, 26 Aug 2020 04:27:35 +0000 Parav Pandit wrote:
> > On Tue, 25 Aug 2020 16:58:38 +0300 Parav Pandit wrote:  
> > > A devlink port may be for a controller consist of PCI device.
> > > A devlink instance holds ports of two types of controllers.
> > > (1) controller discovered on same system where eswitch resides This is
> > > the case where PCI PF/VF of a controller and devlink eswitch instance
> > > both are located on a single system.
> > > (2) controller located on other system.
> > > This is the case where a controller is located in one system and its
> > > devlink eswitch ports are located in a different system. In this case
> > > devlink instance of the eswitch only have access to ports of the
> > > controller.
> > >
> > > When a devlink eswitch instance serves the devlink ports of both
> > > controllers together, PCI PF/VF numbers may overlap.
> > > Due to this a unique phys_port_name cannot be constructed.  
> > 
> > This description is clear as mud to me. Is it just me? Can someone understand
> > this?  
> 
> I would like to improve this description.
> Do you have an input to describe these two different controllers,
> each has same PF and VF numbers?

Not yet, I'm just trying to figure out how things come together.

Are some VFs of the same PF under one controller and other ones under 
a different controller? 

> $ devlink port show looks like below without a controller annotation.
> pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
> pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0

How can you have two PF 0? Aaah - by controller you mean hardware IP,
not whoever is controlling the switching! So the chip has multiple HW
controllers, each of which can have multiple PFs?

Definitely please make that more clear.

Why is @controller_num not under PCI port attrs, but a separate field
without even a mention of PCI? Are some of the controllers a different
bus?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-26 20:07       ` Jakub Kicinski
@ 2020-08-27  4:31         ` Parav Pandit
  2020-08-27 18:32           ` Jakub Kicinski
  0 siblings, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-08-27  4:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko


> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, August 27, 2020 1:38 AM
> 
> On Wed, 26 Aug 2020 04:27:35 +0000 Parav Pandit wrote:
> > > On Tue, 25 Aug 2020 16:58:38 +0300 Parav Pandit wrote:
> > > > A devlink port may be for a controller consist of PCI device.
> > > > A devlink instance holds ports of two types of controllers.
> > > > (1) controller discovered on same system where eswitch resides
> > > > This is the case where PCI PF/VF of a controller and devlink
> > > > eswitch instance both are located on a single system.
> > > > (2) controller located on other system.
> > > > This is the case where a controller is located in one system and
> > > > its devlink eswitch ports are located in a different system. In
> > > > this case devlink instance of the eswitch only have access to
> > > > ports of the controller.
> > > >
> > > > When a devlink eswitch instance serves the devlink ports of both
> > > > controllers together, PCI PF/VF numbers may overlap.
> > > > Due to this a unique phys_port_name cannot be constructed.
> > >
> > > This description is clear as mud to me. Is it just me? Can someone
> > > understand this?
> >
> > I would like to improve this description.
> > Do you have an input to describe these two different controllers, each
> > has same PF and VF numbers?
> 
> Not yet, I'm just trying to figure out how things come together.
> 
> Are some VFs of the same PF under one controller and other ones under a
> different controller?
> 
Correct.

> > $ devlink port show looks like below without a controller annotation.
> > pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
> > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> 
> How can you have two PF 0? Aaah - by controller you mean hardware IP, not
> whoever is controlling the switching! So the chip has multiple HW controllers,
> each of which can have multiple PFs?
> 
Hardware IP is one. This IP is plugged into two PCI root complexes.
One is eswitch PF, this PF has its own VFs/SFs.
Other PF(s) plugged into an second PCI Root complex serving the server system.
So you are right there are multiple PFs. Both the PFs have same PCI BDF.

> Definitely please make that more clear.
> 
Ok. yeah.
I should add above details.
I think a text diagram might be more useful.

> Why is @controller_num not under PCI port attrs, but a separate field
> without even a mention of PCI? Are some of the controllers a different bus?
I think It can be added under PCI attribute, but than we need to add it for PF and VF and in future for SF too.
So added it as generic. But yes, keeping it as part of PCI port attributes is good too.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-27  4:31         ` Parav Pandit
@ 2020-08-27 18:32           ` Jakub Kicinski
  2020-08-27 20:15             ` Parav Pandit
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-08-27 18:32 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Thu, 27 Aug 2020 04:31:43 +0000 Parav Pandit wrote:
> > > $ devlink port show looks like below without a controller annotation.
> > > pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0  
> > 
> > How can you have two PF 0? Aaah - by controller you mean hardware IP, not
> > whoever is controlling the switching! So the chip has multiple HW controllers,
> > each of which can have multiple PFs?
> >   
> Hardware IP is one. This IP is plugged into two PCI root complexes.
> One is eswitch PF, this PF has its own VFs/SFs.
> Other PF(s) plugged into an second PCI Root complex serving the server system.
> So you are right there are multiple PFs.

I find it strange that you have pfnum 0 everywhere but then different
controllers. For MultiHost at Netronome we've used pfnum to distinguish
between the hosts. ASIC must have some unique identifiers for each PF.

I'm not aware of any practical reason for creating PFs on one RC
without reinitializing all the others.

I can see how having multiple controllers may make things clearer, but
adding another layer of IDs while the one under it is unused (pfnum=0)
feels very unnecessary.

> Both the PFs have same PCI BDF.

BDFs are irrelevant.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-27 18:32           ` Jakub Kicinski
@ 2020-08-27 20:15             ` Parav Pandit
  2020-08-27 21:42               ` Jakub Kicinski
  0 siblings, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-08-27 20:15 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko



> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, August 28, 2020 12:02 AM
> 
> On Thu, 27 Aug 2020 04:31:43 +0000 Parav Pandit wrote:
> > > > $ devlink port show looks like below without a controller annotation.
> > > > pci/0000:00:08.0/0: type eth netdev eth5 flavour physical
> > > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> > >
> > > How can you have two PF 0? Aaah - by controller you mean hardware
> > > IP, not whoever is controlling the switching! So the chip has
> > > multiple HW controllers, each of which can have multiple PFs?
> > >
> > Hardware IP is one. This IP is plugged into two PCI root complexes.
> > One is eswitch PF, this PF has its own VFs/SFs.
> > Other PF(s) plugged into an second PCI Root complex serving the server
> system.
> > So you are right there are multiple PFs.
> 
> I find it strange that you have pfnum 0 everywhere but then different
> controllers.
There are multiple PFs, connected to different PCI RC. So device has same pfnum for both the PFs.

> For MultiHost at Netronome we've used pfnum to distinguish
> between the hosts. ASIC must have some unique identifiers for each PF.
> 
Yes. there is. It is identified by a unique controller number; internally it is called host_number.
But internal host_number is misleading term as multiple cables of same physical card can be plugged into single host.
So identifying based on a unique (controller) number and matching that up on external cable is desired.

> I'm not aware of any practical reason for creating PFs on one RC without
> reinitializing all the others.
I may be misunderstanding, but how is initialization is related multiple PFs?

> 
> I can see how having multiple controllers may make things clearer, but adding
> another layer of IDs while the one under it is unused (pfnum=0) feels very
> unnecessary.
pfnum=0 is used today. not sure I understand your comment about being unused.
Can you please explain?

Hierarchical naming kind of make sense, but if you have other ideas to annotate the controller, without changing the hardware pfnum, lets discuss.

> 
> > Both the PFs have same PCI BDF.
> 
> BDFs are irrelevant.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-27 20:15             ` Parav Pandit
@ 2020-08-27 21:42               ` Jakub Kicinski
  2020-08-28  4:27                 ` Parav Pandit
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-08-27 21:42 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> >
> > I find it strange that you have pfnum 0 everywhere but then different
> > controllers.  
> There are multiple PFs, connected to different PCI RC. So device has
> same pfnum for both the PFs.
> 
> > For MultiHost at Netronome we've used pfnum to distinguish
> > between the hosts. ASIC must have some unique identifiers for each
> > PF. 
> Yes. there is. It is identified by a unique controller number;
> internally it is called host_number. But internal host_number is
> misleading term as multiple cables of same physical card can be
> plugged into single host. So identifying based on a unique
> (controller) number and matching that up on external cable is desired.
> 
> > I'm not aware of any practical reason for creating PFs on one RC
> > without reinitializing all the others.  
> I may be misunderstanding, but how is initialization is related
> multiple PFs?

If the number of PFs is static it should be possible to understand
which one is on which system.

> > I can see how having multiple controllers may make things clearer,
> > but adding another layer of IDs while the one under it is unused
> > (pfnum=0) feels very unnecessary.  
> pfnum=0 is used today. not sure I understand your comment about being
> unused. Can you please explain?

You examples only ever have pfnum 0:

From patch 2:

$ devlink port show pci/0000:00:08.0/2
pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0 vfnum 1 splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show -jp pci/0000:00:08.0/2
{
    "port": {
        "pci/0000:00:08.0/1": {
            "type": "eth",
            "netdev": "eth7",
            "controller": 0,
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

From earlier email:

pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0

If you never use pfnum, you can just put the controller ID there, 
like Netronome.

> Hierarchical naming kind of make sense, but if you have other ideas
> to annotate the controller, without changing the hardware pfnum, lets
> discuss.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-27 21:42               ` Jakub Kicinski
@ 2020-08-28  4:27                 ` Parav Pandit
  2020-08-28  5:08                   ` Parav Pandit
  2020-08-28 16:43                   ` Jakub Kicinski
  0 siblings, 2 replies; 51+ messages in thread
From: Parav Pandit @ 2020-08-28  4:27 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko


> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, August 28, 2020 3:12 AM
> 
> On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > > From: Jakub Kicinski <kuba@kernel.org>
> > >
> > > I find it strange that you have pfnum 0 everywhere but then
> > > different controllers.
> > There are multiple PFs, connected to different PCI RC. So device has
> > same pfnum for both the PFs.
> >
> > > For MultiHost at Netronome we've used pfnum to distinguish between
> > > the hosts. ASIC must have some unique identifiers for each PF.
> > Yes. there is. It is identified by a unique controller number;
> > internally it is called host_number. But internal host_number is
> > misleading term as multiple cables of same physical card can be
> > plugged into single host. So identifying based on a unique
> > (controller) number and matching that up on external cable is desired.
> >
> > > I'm not aware of any practical reason for creating PFs on one RC
> > > without reinitializing all the others.
> > I may be misunderstanding, but how is initialization is related
> > multiple PFs?
> 
> If the number of PFs is static it should be possible to understand which one is on
> which system.
>
How? How do we tell that pfnum A means external system.
Want to avoid such 'implicit' notion.
 
> > > I can see how having multiple controllers may make things clearer,
> > > but adding another layer of IDs while the one under it is unused
> > > (pfnum=0) feels very unnecessary.
> > pfnum=0 is used today. not sure I understand your comment about being
> > unused. Can you please explain?
> 
> You examples only ever have pfnum 0:
> 
Because both controllers have pfnum 0.

> From patch 2:
> 
> $ devlink port show pci/0000:00:08.0/2
> pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0
> vfnum 1 splittable false
>   function:
>     hw_addr 00:00:00:00:00:00
> 
> $ devlink port show -jp pci/0000:00:08.0/2 {
>     "port": {
>         "pci/0000:00:08.0/1": {
>             "type": "eth",
>             "netdev": "eth7",
>             "controller": 0,
>             "flavour": "pcivf",
>             "pfnum": 0,
>             "vfnum": 1,
>             "splittable": false,
>             "function": {
>                 "hw_addr": "00:00:00:00:00:00"
>             }
>         }
>     }
> }
> 
> From earlier email:
> 
> pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> 
> If you never use pfnum, you can just put the controller ID there, like Netronome.
>
It likely not going to work for us. Because pfnum is not some randomly generated number.
It is linked to the underlying PCI pf number. {pf0, pf1...}
Orchestration sw uses this to identify representor of a PF-VF pair.

Replacing pfnum with controller number breaks this; and it still doesn't tell user that it's the pf on other_host.

So it is used, and would like to continue to use even if there are multiple PFs port (that has same pfnum) under the same eswitch.

In an alternative,
Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
There can be better name than epcipf. I just put epcipf to differentiate it.
However these ports have same attributes as pcipf, pcivf, pcisf flavours.

> > Hierarchical naming kind of make sense, but if you have other ideas to
> > annotate the controller, without changing the hardware pfnum, lets
> > discuss.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-28  4:27                 ` Parav Pandit
@ 2020-08-28  5:08                   ` Parav Pandit
  2020-08-28 16:43                   ` Jakub Kicinski
  1 sibling, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-08-28  5:08 UTC (permalink / raw)
  To: Parav Pandit, Jakub Kicinski
  Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko



> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Parav Pandit
> Sent: Friday, August 28, 2020 9:57 AM
> 
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Friday, August 28, 2020 3:12 AM
> >
> > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > >
> > > > I find it strange that you have pfnum 0 everywhere but then
> > > > different controllers.
> > > There are multiple PFs, connected to different PCI RC. So device has
> > > same pfnum for both the PFs.
> > >
> > > > For MultiHost at Netronome we've used pfnum to distinguish between
> > > > the hosts. ASIC must have some unique identifiers for each PF.
> > > Yes. there is. It is identified by a unique controller number;
> > > internally it is called host_number. But internal host_number is
> > > misleading term as multiple cables of same physical card can be
> > > plugged into single host. So identifying based on a unique
> > > (controller) number and matching that up on external cable is desired.
> > >
> > > > I'm not aware of any practical reason for creating PFs on one RC
> > > > without reinitializing all the others.
> > > I may be misunderstanding, but how is initialization is related
> > > multiple PFs?
> >
> > If the number of PFs is static it should be possible to understand
> > which one is on which system.
> >
> How? How do we tell that pfnum A means external system.
> Want to avoid such 'implicit' notion.
> 
> > > > I can see how having multiple controllers may make things clearer,
> > > > but adding another layer of IDs while the one under it is unused
> > > > (pfnum=0) feels very unnecessary.
> > > pfnum=0 is used today. not sure I understand your comment about
> > > being unused. Can you please explain?
> >
> > You examples only ever have pfnum 0:
> >
> Because both controllers have pfnum 0.
> 
> > From patch 2:
> >
> > $ devlink port show pci/0000:00:08.0/2
> > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf
> > pfnum 0 vfnum 1 splittable false
> >   function:
> >     hw_addr 00:00:00:00:00:00
> >
> > $ devlink port show -jp pci/0000:00:08.0/2 {
> >     "port": {
> >         "pci/0000:00:08.0/1": {
> >             "type": "eth",
> >             "netdev": "eth7",
> >             "controller": 0,
> >             "flavour": "pcivf",
> >             "pfnum": 0,
> >             "vfnum": 1,
> >             "splittable": false,
> >             "function": {
> >                 "hw_addr": "00:00:00:00:00:00"
> >             }
> >         }
> >     }
> > }
> >
> > From earlier email:
> >
> > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> >
> > If you never use pfnum, you can just put the controller ID there, like
> Netronome.
> >
> It likely not going to work for us. Because pfnum is not some randomly
> generated number.
> It is linked to the underlying PCI pf number. {pf0, pf1...} Orchestration sw uses
> this to identify representor of a PF-VF pair.
> 
> Replacing pfnum with controller number breaks this; and it still doesn't tell user
> that it's the pf on other_host.
> 
> So it is used, and would like to continue to use even if there are multiple PFs port
> (that has same pfnum) under the same eswitch.
> 
> In an alternative,
> Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new
> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> There can be better name than epcipf. I just put epcipf to differentiate it.
> However these ports have same attributes as pcipf, pcivf, pcisf flavours.
> 
I pressed the send button without an example of an alternative.
Changed eth dev name to be more readable as phys_port_name.

$ devlink port show
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_epf0 flavour epcipf pfnum 0

This naming only go far as long as each multi-host controller is in different eswitch.
When user prefers them under same eswitch, we will again have collision.
Hence, I suggest to use controller number that addressed both the use cases.

I do not know when Mellanox plan to support this mode, but I was told that this is likely.
What about Netronome? Is each host in different eswitch?

$ devlink port show
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0

This combines the idea of building phys_port_name of pcipf and external_pcipf ports in same way.
At the same time it has the ability to use the controller number.

Secondly, 
I forgot to mention previously that each controller (in multi host) setup consist of 2 PFs on same cable.
So pfnum!=host_number either.

Hence using controller number covering both use cases looks better to me.

> > > Hierarchical naming kind of make sense, but if you have other ideas
> > > to annotate the controller, without changing the hardware pfnum,
> > > lets discuss.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-28  4:27                 ` Parav Pandit
  2020-08-28  5:08                   ` Parav Pandit
@ 2020-08-28 16:43                   ` Jakub Kicinski
  2020-08-29  3:43                     ` Parav Pandit
  1 sibling, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-08-28 16:43 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Fri, 28 Aug 2020 04:27:19 +0000 Parav Pandit wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Friday, August 28, 2020 3:12 AM
> > 
> > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:  
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > >
> > > > I find it strange that you have pfnum 0 everywhere but then
> > > > different controllers.  
> > > There are multiple PFs, connected to different PCI RC. So device has
> > > same pfnum for both the PFs.
> > >  
> > > > For MultiHost at Netronome we've used pfnum to distinguish between
> > > > the hosts. ASIC must have some unique identifiers for each PF.  
> > > Yes. there is. It is identified by a unique controller number;
> > > internally it is called host_number. But internal host_number is
> > > misleading term as multiple cables of same physical card can be
> > > plugged into single host. So identifying based on a unique
> > > (controller) number and matching that up on external cable is desired.
> > >  
> > > > I'm not aware of any practical reason for creating PFs on one RC
> > > > without reinitializing all the others.  
> > > I may be misunderstanding, but how is initialization is related
> > > multiple PFs?  
> > 
> > If the number of PFs is static it should be possible to understand which one is on
> > which system.
>
> How? How do we tell that pfnum A means external system.
> Want to avoid such 'implicit' notion.

How do you tell that controller A means external system?

> > > > I can see how having multiple controllers may make things clearer,
> > > > but adding another layer of IDs while the one under it is unused
> > > > (pfnum=0) feels very unnecessary.  
> > > pfnum=0 is used today. not sure I understand your comment about being
> > > unused. Can you please explain?  
> > 
> > You examples only ever have pfnum 0:
> >   
> Because both controllers have pfnum 0.
> 
> > From patch 2:
> > 
> > $ devlink port show pci/0000:00:08.0/2
> > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf pfnum 0
> > vfnum 1 splittable false
> >   function:
> >     hw_addr 00:00:00:00:00:00
> > 
> > $ devlink port show -jp pci/0000:00:08.0/2 {
> >     "port": {
> >         "pci/0000:00:08.0/1": {
> >             "type": "eth",
> >             "netdev": "eth7",
> >             "controller": 0,
> >             "flavour": "pcivf",
> >             "pfnum": 0,
> >             "vfnum": 1,
> >             "splittable": false,
> >             "function": {
> >                 "hw_addr": "00:00:00:00:00:00"
> >             }
> >         }
> >     }
> > }
> > 
> > From earlier email:
> > 
> > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> > 
> > If you never use pfnum, you can just put the controller ID there, like Netronome.
> >  
> It likely not going to work for us. Because pfnum is not some randomly generated number.
> It is linked to the underlying PCI pf number. {pf0, pf1...}
> Orchestration sw uses this to identify representor of a PF-VF pair.

For orchestration software which is unaware of controllers ports will
still alias on pf/vf nums.

Besides you have one devlink instance per port currently so I'm guessing
there is no pf1 ever, in your case...

> Replacing pfnum with controller number breaks this; and it still doesn't tell user that it's the pf on other_host.

Neither does the opaque controller id. Maybe now you understand better
why I wanted peer objects :/

> So it is used, and would like to continue to use even if there are multiple PFs port (that has same pfnum) under the same eswitch.
> 
> In an alternative,
> Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> There can be better name than epcipf. I just put epcipf to differentiate it.
> However these ports have same attributes as pcipf, pcivf, pcisf flavours.

I don't think the controllers are a terrible idea. Seems like a fairly
reasonable extension. But MLX don't seem to need them. And you have a
history of trying to make the Linux APIs look like your FW API.

Jiri, would you mind chiming in? What's your take?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-28 16:43                   ` Jakub Kicinski
@ 2020-08-29  3:43                     ` Parav Pandit
  2020-09-01  8:19                       ` Jiri Pirko
  0 siblings, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-08-29  3:43 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko



> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, August 28, 2020 10:14 PM
> 
> On Fri, 28 Aug 2020 04:27:19 +0000 Parav Pandit wrote:
> > > From: Jakub Kicinski <kuba@kernel.org>
> > > Sent: Friday, August 28, 2020 3:12 AM
> > >
> > > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > > > > From: Jakub Kicinski <kuba@kernel.org>
> > > > >
> > > > > I find it strange that you have pfnum 0 everywhere but then
> > > > > different controllers.
> > > > There are multiple PFs, connected to different PCI RC. So device
> > > > has same pfnum for both the PFs.
> > > >
> > > > > For MultiHost at Netronome we've used pfnum to distinguish
> > > > > between the hosts. ASIC must have some unique identifiers for each PF.
> > > > Yes. there is. It is identified by a unique controller number;
> > > > internally it is called host_number. But internal host_number is
> > > > misleading term as multiple cables of same physical card can be
> > > > plugged into single host. So identifying based on a unique
> > > > (controller) number and matching that up on external cable is desired.
> > > >
> > > > > I'm not aware of any practical reason for creating PFs on one RC
> > > > > without reinitializing all the others.
> > > > I may be misunderstanding, but how is initialization is related
> > > > multiple PFs?
> > >
> > > If the number of PFs is static it should be possible to understand
> > > which one is on which system.
> >
> > How? How do we tell that pfnum A means external system.
> > Want to avoid such 'implicit' notion.
> 
> How do you tell that controller A means external system?
Which is why I started with annotating only external controllers, mainly to avoid renaming and breaking current scheme for non_smartnic cases which possibly is the most user base.

But probably external pcipf/vf/sf port flavours are more intuitive combined with controller number.
More below.

> 
> > > > > I can see how having multiple controllers may make things
> > > > > clearer, but adding another layer of IDs while the one under it
> > > > > is unused
> > > > > (pfnum=0) feels very unnecessary.
> > > > pfnum=0 is used today. not sure I understand your comment about
> > > > being unused. Can you please explain?
> > >
> > > You examples only ever have pfnum 0:
> > >
> > Because both controllers have pfnum 0.
> >
> > > From patch 2:
> > >
> > > $ devlink port show pci/0000:00:08.0/2
> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf
> > > pfnum 0 vfnum 1 splittable false
> > >   function:
> > >     hw_addr 00:00:00:00:00:00
> > >
> > > $ devlink port show -jp pci/0000:00:08.0/2 {
> > >     "port": {
> > >         "pci/0000:00:08.0/1": {
> > >             "type": "eth",
> > >             "netdev": "eth7",
> > >             "controller": 0,
> > >             "flavour": "pcivf",
> > >             "pfnum": 0,
> > >             "vfnum": 1,
> > >             "splittable": false,
> > >             "function": {
> > >                 "hw_addr": "00:00:00:00:00:00"
> > >             }
> > >         }
> > >     }
> > > }
> > >
> > > From earlier email:
> > >
> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> > >
> > > If you never use pfnum, you can just put the controller ID there, like
> Netronome.
> > >
> > It likely not going to work for us. Because pfnum is not some randomly
> generated number.
> > It is linked to the underlying PCI pf number. {pf0, pf1...}
> > Orchestration sw uses this to identify representor of a PF-VF pair.
> 
> For orchestration software which is unaware of controllers ports will still alias
> on pf/vf nums.
>
Yes.
Orchestration which will be aware of controller, will use it.
 
> Besides you have one devlink instance per port currently so I'm guessing there is
> no pf1 ever, in your case...
>
Currently there are multiple devlink instance. One for pf0, other for pf1.
Ports of both instances have the same switch id.
 
> > Replacing pfnum with controller number breaks this; and it still doesn't tell user
> that it's the pf on other_host.
> 
> Neither does the opaque controller id. 
Which is why I tossed the epcipf (external pci pf) port flavour that fits in current model.
But doesn't allow multiple external hosts under same eswitch for those devices which has same pci pf, vf numbers among those hosts. (and it is the case for mlnx).

> Maybe now you understand better why I wanted peer objects :/
>
I wasn't against peer object. But showing netdev of peer object assumed no_smartnic, it also assume other_side is also similar Linux kernel.
Anyways, I make humble request get over the past to move forward. :-)

> > So it is used, and would like to continue to use even if there are multiple PFs
> port (that has same pfnum) under the same eswitch.
> >
> > In an alternative,
> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new
> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> > There can be better name than epcipf. I just put epcipf to differentiate it.
> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
> 
> I don't think the controllers are a terrible idea. Seems like a fairly reasonable
> extension.
Ok. 
> But MLX don't seem to need them. And you have a history of trying to
> make the Linux APIs look like your FW API.
> 
Because there are two devlink instances for each PF?
I think for now an epcipf, epcivf flavour would just suffice due to lack of multiple devlink instances.
But in long run it is better to have the controller covering few topologies.
Otherwise we will break the rep naming later when multiple controllers are managed by single eswitch (without notion of controller).

Sometime my text is confusing. :-) so adding example of the thoughts below.
Example: Eswitch side devlink port show for multi-host setup considering the smartnic.

$ devlink port show
pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
                                                                                                             ^^^^^ new port flavour.
pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum 1

Here one controller has two pci pfs (0,1}. Eswitch shows that they are external pci ports.
Whenever (not sure when), mlnx converts to single devlink instance, this will continue to work.
It will also work when multiple controller(s) (of external host) ports have same switch_id (for orchestration).
And this doesn't break any backward compatibility for non multihost, non smatnic users.

> Jiri, would you mind chiming in? What's your take?

Will wait for his inputs..

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-08-29  3:43                     ` Parav Pandit
@ 2020-09-01  8:19                       ` Jiri Pirko
  2020-09-01  8:53                         ` Parav Pandit
  0 siblings, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2020-09-01  8:19 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

Sat, Aug 29, 2020 at 05:43:58AM CEST, parav@nvidia.com wrote:
>
>
>> From: Jakub Kicinski <kuba@kernel.org>
>> Sent: Friday, August 28, 2020 10:14 PM
>> 
>> On Fri, 28 Aug 2020 04:27:19 +0000 Parav Pandit wrote:
>> > > From: Jakub Kicinski <kuba@kernel.org>
>> > > Sent: Friday, August 28, 2020 3:12 AM
>> > >
>> > > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
>> > > > > From: Jakub Kicinski <kuba@kernel.org>
>> > > > >
>> > > > > I find it strange that you have pfnum 0 everywhere but then
>> > > > > different controllers.
>> > > > There are multiple PFs, connected to different PCI RC. So device
>> > > > has same pfnum for both the PFs.
>> > > >
>> > > > > For MultiHost at Netronome we've used pfnum to distinguish
>> > > > > between the hosts. ASIC must have some unique identifiers for each PF.
>> > > > Yes. there is. It is identified by a unique controller number;
>> > > > internally it is called host_number. But internal host_number is
>> > > > misleading term as multiple cables of same physical card can be
>> > > > plugged into single host. So identifying based on a unique
>> > > > (controller) number and matching that up on external cable is desired.
>> > > >
>> > > > > I'm not aware of any practical reason for creating PFs on one RC
>> > > > > without reinitializing all the others.
>> > > > I may be misunderstanding, but how is initialization is related
>> > > > multiple PFs?
>> > >
>> > > If the number of PFs is static it should be possible to understand
>> > > which one is on which system.
>> >
>> > How? How do we tell that pfnum A means external system.
>> > Want to avoid such 'implicit' notion.
>> 
>> How do you tell that controller A means external system?

Perhaps the attr name could be explicitly containing "external" word?
Like:
"ext_controller" or "extnum" (similar to "pfnum" and "vfnum") something
like that.


>Which is why I started with annotating only external controllers, mainly to avoid renaming and breaking current scheme for non_smartnic cases which possibly is the most user base.
>
>But probably external pcipf/vf/sf port flavours are more intuitive combined with controller number.
>More below.
>
>> 
>> > > > > I can see how having multiple controllers may make things
>> > > > > clearer, but adding another layer of IDs while the one under it
>> > > > > is unused
>> > > > > (pfnum=0) feels very unnecessary.
>> > > > pfnum=0 is used today. not sure I understand your comment about
>> > > > being unused. Can you please explain?
>> > >
>> > > You examples only ever have pfnum 0:
>> > >
>> > Because both controllers have pfnum 0.
>> >
>> > > From patch 2:
>> > >
>> > > $ devlink port show pci/0000:00:08.0/2
>> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf
>> > > pfnum 0 vfnum 1 splittable false
>> > >   function:
>> > >     hw_addr 00:00:00:00:00:00
>> > >
>> > > $ devlink port show -jp pci/0000:00:08.0/2 {
>> > >     "port": {
>> > >         "pci/0000:00:08.0/1": {
>> > >             "type": "eth",
>> > >             "netdev": "eth7",
>> > >             "controller": 0,
>> > >             "flavour": "pcivf",
>> > >             "pfnum": 0,
>> > >             "vfnum": 1,
>> > >             "splittable": false,
>> > >             "function": {
>> > >                 "hw_addr": "00:00:00:00:00:00"
>> > >             }
>> > >         }
>> > >     }
>> > > }
>> > >
>> > > From earlier email:
>> > >
>> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
>> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
>> > >
>> > > If you never use pfnum, you can just put the controller ID there, like
>> Netronome.
>> > >
>> > It likely not going to work for us. Because pfnum is not some randomly
>> generated number.
>> > It is linked to the underlying PCI pf number. {pf0, pf1...}
>> > Orchestration sw uses this to identify representor of a PF-VF pair.
>> 
>> For orchestration software which is unaware of controllers ports will still alias
>> on pf/vf nums.
>>
>Yes.
>Orchestration which will be aware of controller, will use it.
> 
>> Besides you have one devlink instance per port currently so I'm guessing there is
>> no pf1 ever, in your case...
>>
>Currently there are multiple devlink instance. One for pf0, other for pf1.
>Ports of both instances have the same switch id.
> 
>> > Replacing pfnum with controller number breaks this; and it still doesn't tell user
>> that it's the pf on other_host.
>> 
>> Neither does the opaque controller id. 
>Which is why I tossed the epcipf (external pci pf) port flavour that fits in current model.
>But doesn't allow multiple external hosts under same eswitch for those devices which has same pci pf, vf numbers among those hosts. (and it is the case for mlnx).
>
>> Maybe now you understand better why I wanted peer objects :/
>>
>I wasn't against peer object. But showing netdev of peer object assumed no_smartnic, it also assume other_side is also similar Linux kernel.
>Anyways, I make humble request get over the past to move forward. :-)
>
>> > So it is used, and would like to continue to use even if there are multiple PFs
>> port (that has same pfnum) under the same eswitch.
>> >
>> > In an alternative,
>> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new
>> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
>> > There can be better name than epcipf. I just put epcipf to differentiate it.
>> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
>> 
>> I don't think the controllers are a terrible idea. Seems like a fairly reasonable
>> extension.
>Ok. 
>> But MLX don't seem to need them. And you have a history of trying to
>> make the Linux APIs look like your FW API.
>> 
>Because there are two devlink instances for each PF?
>I think for now an epcipf, epcivf flavour would just suffice due to lack of multiple devlink instances.
>But in long run it is better to have the controller covering few topologies.
>Otherwise we will break the rep naming later when multiple controllers are managed by single eswitch (without notion of controller).
>
>Sometime my text is confusing. :-) so adding example of the thoughts below.
>Example: Eswitch side devlink port show for multi-host setup considering the smartnic.
>
>$ devlink port show
>pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
>pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
>pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
>                                                                                                             ^^^^^ new port flavour.
>pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
>pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
>pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum 1
>
>Here one controller has two pci pfs (0,1}. Eswitch shows that they are external pci ports.
>Whenever (not sure when), mlnx converts to single devlink instance, this will continue to work.
>It will also work when multiple controller(s) (of external host) ports have same switch_id (for orchestration).
>And this doesn't break any backward compatibility for non multihost, non smatnic users.
>
>> Jiri, would you mind chiming in? What's your take?
>
>Will wait for his inputs..

I don't see the need for new flavour. The port is still pf same as the
local pf, it only resides on a different host. We just need to make sure
to resolve the conflict between PFX and PFX on 2 different hosts
(local/ext or ext/ext)

So I think that for local PFs, no change is needed.
The external PFs need to have an extra attribute with "external
enumeration" what would be used for the representor netdev name as well.

pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf extnum 0 pfnum 0
pci/0000:00:08.0/3: type eth netdev enp0s8f0_e1pf0 flavour pcipf extnum 1 pfnum 0

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-01  8:19                       ` Jiri Pirko
@ 2020-09-01  8:53                         ` Parav Pandit
  2020-09-01  9:17                           ` Jiri Pirko
  0 siblings, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-09-01  8:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko


> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Tuesday, September 1, 2020 1:49 PM
> 
> >> > How? How do we tell that pfnum A means external system.
> >> > Want to avoid such 'implicit' notion.
> >>
> >> How do you tell that controller A means external system?
> 
> Perhaps the attr name could be explicitly containing "external" word?
> Like:
> "ext_controller" or "extnum" (similar to "pfnum" and "vfnum") something
> like that.

How about ecnum "external controller number"?
Tiny change in the phys_port_name below example.

> 
> 
> >Which is why I started with annotating only external controllers, mainly to
> avoid renaming and breaking current scheme for non_smartnic cases which
> possibly is the most user base.
> >
> >But probably external pcipf/vf/sf port flavours are more intuitive combined
> with controller number.
> >More below.
> >
> >>
> >> > > > > I can see how having multiple controllers may make things
> >> > > > > clearer, but adding another layer of IDs while the one under
> >> > > > > it is unused
> >> > > > > (pfnum=0) feels very unnecessary.
> >> > > > pfnum=0 is used today. not sure I understand your comment about
> >> > > > being unused. Can you please explain?
> >> > >
> >> > > You examples only ever have pfnum 0:
> >> > >
> >> > Because both controllers have pfnum 0.
> >> >
> >> > > From patch 2:
> >> > >
> >> > > $ devlink port show pci/0000:00:08.0/2
> >> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour
> >> > > pcivf pfnum 0 vfnum 1 splittable false
> >> > >   function:
> >> > >     hw_addr 00:00:00:00:00:00
> >> > >
> >> > > $ devlink port show -jp pci/0000:00:08.0/2 {
> >> > >     "port": {
> >> > >         "pci/0000:00:08.0/1": {
> >> > >             "type": "eth",
> >> > >             "netdev": "eth7",
> >> > >             "controller": 0,
> >> > >             "flavour": "pcivf",
> >> > >             "pfnum": 0,
> >> > >             "vfnum": 1,
> >> > >             "splittable": false,
> >> > >             "function": {
> >> > >                 "hw_addr": "00:00:00:00:00:00"
> >> > >             }
> >> > >         }
> >> > >     }
> >> > > }
> >> > >
> >> > > From earlier email:
> >> > >
> >> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> >> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> >> > >
> >> > > If you never use pfnum, you can just put the controller ID there,
> >> > > like
> >> Netronome.
> >> > >
> >> > It likely not going to work for us. Because pfnum is not some
> >> > randomly
> >> generated number.
> >> > It is linked to the underlying PCI pf number. {pf0, pf1...}
> >> > Orchestration sw uses this to identify representor of a PF-VF pair.
> >>
> >> For orchestration software which is unaware of controllers ports will
> >> still alias on pf/vf nums.
> >>
> >Yes.
> >Orchestration which will be aware of controller, will use it.
> >
> >> Besides you have one devlink instance per port currently so I'm
> >> guessing there is no pf1 ever, in your case...
> >>
> >Currently there are multiple devlink instance. One for pf0, other for pf1.
> >Ports of both instances have the same switch id.
> >
> >> > Replacing pfnum with controller number breaks this; and it still
> >> > doesn't tell user
> >> that it's the pf on other_host.
> >>
> >> Neither does the opaque controller id.
> >Which is why I tossed the epcipf (external pci pf) port flavour that fits in
> current model.
> >But doesn't allow multiple external hosts under same eswitch for those
> devices which has same pci pf, vf numbers among those hosts. (and it is the
> case for mlnx).
> >
> >> Maybe now you understand better why I wanted peer objects :/
> >>
> >I wasn't against peer object. But showing netdev of peer object assumed
> no_smartnic, it also assume other_side is also similar Linux kernel.
> >Anyways, I make humble request get over the past to move forward. :-)
> >
> >> > So it is used, and would like to continue to use even if there are
> >> > multiple PFs
> >> port (that has same pfnum) under the same eswitch.
> >> >
> >> > In an alternative,
> >> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we
> >> > introduce new
> >> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> >> > There can be better name than epcipf. I just put epcipf to differentiate
> it.
> >> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
> >>
> >> I don't think the controllers are a terrible idea. Seems like a
> >> fairly reasonable extension.
> >Ok.
> >> But MLX don't seem to need them. And you have a history of trying to
> >> make the Linux APIs look like your FW API.
> >>
> >Because there are two devlink instances for each PF?
> >I think for now an epcipf, epcivf flavour would just suffice due to lack of
> multiple devlink instances.
> >But in long run it is better to have the controller covering few topologies.
> >Otherwise we will break the rep naming later when multiple controllers are
> managed by single eswitch (without notion of controller).
> >
> >Sometime my text is confusing. :-) so adding example of the thoughts
> below.
> >Example: Eswitch side devlink port show for multi-host setup considering
> the smartnic.
> >
> >$ devlink port show
> >pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> >pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
> >pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
> >                                                                                                             ^^^^^ new port
> flavour.
> >pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
> >pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
> >pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum
> >1
> >
> >Here one controller has two pci pfs (0,1}. Eswitch shows that they are
> external pci ports.
> >Whenever (not sure when), mlnx converts to single devlink instance, this
> will continue to work.
> >It will also work when multiple controller(s) (of external host) ports have
> same switch_id (for orchestration).
> >And this doesn't break any backward compatibility for non multihost, non
> smatnic users.
> >
> >> Jiri, would you mind chiming in? What's your take?
> >
> >Will wait for his inputs..
> 
> I don't see the need for new flavour. The port is still pf same as the local pf, it
> only resides on a different host. We just need to make sure to resolve the
> conflict between PFX and PFX on 2 different hosts (local/ext or ext/ext)
> 
Yes. I agree. I do not have strong opinion on new flavour as long as we make clear that this is for the external controller.

> So I think that for local PFs, no change is needed.
Yep.

> The external PFs need to have an extra attribute with "external
> enumeration" what would be used for the representor netdev name as well.
> 
> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf extnum 0
> pfnum 0

How about a prefix of "ec" instead of "e", like?
pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf ecnum 0 pfnum 0
                                                                                     ^^^^
> pci/0000:00:08.0/3: type eth netdev enp0s8f0_e1pf0 flavour pcipf extnum 1
> pfnum 0


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-01  8:53                         ` Parav Pandit
@ 2020-09-01  9:17                           ` Jiri Pirko
  2020-09-01 21:28                             ` Jakub Kicinski
  0 siblings, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2020-09-01  9:17 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

Tue, Sep 01, 2020 at 10:53:23AM CEST, parav@nvidia.com wrote:
>
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Tuesday, September 1, 2020 1:49 PM
>> 
>> >> > How? How do we tell that pfnum A means external system.
>> >> > Want to avoid such 'implicit' notion.
>> >>
>> >> How do you tell that controller A means external system?
>> 
>> Perhaps the attr name could be explicitly containing "external" word?
>> Like:
>> "ext_controller" or "extnum" (similar to "pfnum" and "vfnum") something
>> like that.
>
>How about ecnum "external controller number"?
>Tiny change in the phys_port_name below example.
>
>> 
>> 
>> >Which is why I started with annotating only external controllers, mainly to
>> avoid renaming and breaking current scheme for non_smartnic cases which
>> possibly is the most user base.
>> >
>> >But probably external pcipf/vf/sf port flavours are more intuitive combined
>> with controller number.
>> >More below.
>> >
>> >>
>> >> > > > > I can see how having multiple controllers may make things
>> >> > > > > clearer, but adding another layer of IDs while the one under
>> >> > > > > it is unused
>> >> > > > > (pfnum=0) feels very unnecessary.
>> >> > > > pfnum=0 is used today. not sure I understand your comment about
>> >> > > > being unused. Can you please explain?
>> >> > >
>> >> > > You examples only ever have pfnum 0:
>> >> > >
>> >> > Because both controllers have pfnum 0.
>> >> >
>> >> > > From patch 2:
>> >> > >
>> >> > > $ devlink port show pci/0000:00:08.0/2
>> >> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour
>> >> > > pcivf pfnum 0 vfnum 1 splittable false
>> >> > >   function:
>> >> > >     hw_addr 00:00:00:00:00:00
>> >> > >
>> >> > > $ devlink port show -jp pci/0000:00:08.0/2 {
>> >> > >     "port": {
>> >> > >         "pci/0000:00:08.0/1": {
>> >> > >             "type": "eth",
>> >> > >             "netdev": "eth7",
>> >> > >             "controller": 0,
>> >> > >             "flavour": "pcivf",
>> >> > >             "pfnum": 0,
>> >> > >             "vfnum": 1,
>> >> > >             "splittable": false,
>> >> > >             "function": {
>> >> > >                 "hw_addr": "00:00:00:00:00:00"
>> >> > >             }
>> >> > >         }
>> >> > >     }
>> >> > > }
>> >> > >
>> >> > > From earlier email:
>> >> > >
>> >> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
>> >> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
>> >> > >
>> >> > > If you never use pfnum, you can just put the controller ID there,
>> >> > > like
>> >> Netronome.
>> >> > >
>> >> > It likely not going to work for us. Because pfnum is not some
>> >> > randomly
>> >> generated number.
>> >> > It is linked to the underlying PCI pf number. {pf0, pf1...}
>> >> > Orchestration sw uses this to identify representor of a PF-VF pair.
>> >>
>> >> For orchestration software which is unaware of controllers ports will
>> >> still alias on pf/vf nums.
>> >>
>> >Yes.
>> >Orchestration which will be aware of controller, will use it.
>> >
>> >> Besides you have one devlink instance per port currently so I'm
>> >> guessing there is no pf1 ever, in your case...
>> >>
>> >Currently there are multiple devlink instance. One for pf0, other for pf1.
>> >Ports of both instances have the same switch id.
>> >
>> >> > Replacing pfnum with controller number breaks this; and it still
>> >> > doesn't tell user
>> >> that it's the pf on other_host.
>> >>
>> >> Neither does the opaque controller id.
>> >Which is why I tossed the epcipf (external pci pf) port flavour that fits in
>> current model.
>> >But doesn't allow multiple external hosts under same eswitch for those
>> devices which has same pci pf, vf numbers among those hosts. (and it is the
>> case for mlnx).
>> >
>> >> Maybe now you understand better why I wanted peer objects :/
>> >>
>> >I wasn't against peer object. But showing netdev of peer object assumed
>> no_smartnic, it also assume other_side is also similar Linux kernel.
>> >Anyways, I make humble request get over the past to move forward. :-)
>> >
>> >> > So it is used, and would like to continue to use even if there are
>> >> > multiple PFs
>> >> port (that has same pfnum) under the same eswitch.
>> >> >
>> >> > In an alternative,
>> >> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we
>> >> > introduce new
>> >> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
>> >> > There can be better name than epcipf. I just put epcipf to differentiate
>> it.
>> >> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
>> >>
>> >> I don't think the controllers are a terrible idea. Seems like a
>> >> fairly reasonable extension.
>> >Ok.
>> >> But MLX don't seem to need them. And you have a history of trying to
>> >> make the Linux APIs look like your FW API.
>> >>
>> >Because there are two devlink instances for each PF?
>> >I think for now an epcipf, epcivf flavour would just suffice due to lack of
>> multiple devlink instances.
>> >But in long run it is better to have the controller covering few topologies.
>> >Otherwise we will break the rep naming later when multiple controllers are
>> managed by single eswitch (without notion of controller).
>> >
>> >Sometime my text is confusing. :-) so adding example of the thoughts
>> below.
>> >Example: Eswitch side devlink port show for multi-host setup considering
>> the smartnic.
>> >
>> >$ devlink port show
>> >pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
>> >pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
>> >pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
>> >                                                                                                             ^^^^^ new port
>> flavour.
>> >pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
>> >pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
>> >pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum
>> >1
>> >
>> >Here one controller has two pci pfs (0,1}. Eswitch shows that they are
>> external pci ports.
>> >Whenever (not sure when), mlnx converts to single devlink instance, this
>> will continue to work.
>> >It will also work when multiple controller(s) (of external host) ports have
>> same switch_id (for orchestration).
>> >And this doesn't break any backward compatibility for non multihost, non
>> smatnic users.
>> >
>> >> Jiri, would you mind chiming in? What's your take?
>> >
>> >Will wait for his inputs..
>> 
>> I don't see the need for new flavour. The port is still pf same as the local pf, it
>> only resides on a different host. We just need to make sure to resolve the
>> conflict between PFX and PFX on 2 different hosts (local/ext or ext/ext)
>> 
>Yes. I agree. I do not have strong opinion on new flavour as long as we make clear that this is for the external controller.
>
>> So I think that for local PFs, no change is needed.
>Yep.
>
>> The external PFs need to have an extra attribute with "external
>> enumeration" what would be used for the representor netdev name as well.
>> 
>> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
>> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
>> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf extnum 0
>> pfnum 0
>
>How about a prefix of "ec" instead of "e", like?
>pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf ecnum 0 pfnum 0

Yeah, looks fine to me. Jakub?


>                                                                                     ^^^^
>> pci/0000:00:08.0/3: type eth netdev enp0s8f0_e1pf0 flavour pcipf extnum 1
>> pfnum 0
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-01  9:17                           ` Jiri Pirko
@ 2020-09-01 21:28                             ` Jakub Kicinski
  2020-09-02  4:26                               ` Parav Pandit
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-09-01 21:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Parav Pandit, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Tue, 1 Sep 2020 11:17:42 +0200 Jiri Pirko wrote:
> >> The external PFs need to have an extra attribute with "external
> >> enumeration" what would be used for the representor netdev name as well.
> >> 
> >> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> >> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
> >> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf extnum 0
> >> pfnum 0  
> >
> >How about a prefix of "ec" instead of "e", like?
> >pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf ecnum 0 pfnum 0  
> 
> Yeah, looks fine to me. Jakub?

I don't like that local port doesn't have the controller ID.

Whether PCI port is external or not is best described by a the peer
relation. Failing that, at the very least "external" should be a
separate attribute/flag from the controller ID.

I didn't quite get the fact that you want to not show controller ID 
on the local port, initially.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-01 21:28                             ` Jakub Kicinski
@ 2020-09-02  4:26                               ` Parav Pandit
  2020-09-02  4:44                                 ` Parav Pandit
  2020-09-02  8:00                                 ` Jiri Pirko
  0 siblings, 2 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-02  4:26 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko



> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, September 2, 2020 2:59 AM
> 
> On Tue, 1 Sep 2020 11:17:42 +0200 Jiri Pirko wrote:
> > >> The external PFs need to have an extra attribute with "external
> > >> enumeration" what would be used for the representor netdev name as well.
> > >>
> > >> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> > >> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf
> > >> pfnum 0
> > >> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf
> > >> extnum 0 pfnum 0
> > >
> > >How about a prefix of "ec" instead of "e", like?
> > >pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf
> > >ecnum 0 pfnum 0
> >
> > Yeah, looks fine to me. Jakub?
> 
> I don't like that local port doesn't have the controller ID.
> 
Adding controller ID to local port will change name for all non smartnic deployments that affects current vast user base :-(

> Whether PCI port is external or not is best described by a the peer relation.

How about adding an attribute something like below in addition to controller id.

$ devlink port show
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 ecnum 0 external true splitable false
                                                                                                                                                    ^^^^^^^^^^^

> Failing that, at the very least "external" should be a separate attribute/flag from
> the controller ID.
>
Ok. Looks fine to me.

Jiri?

> I didn't quite get the fact that you want to not show controller ID on the local
> port, initially.
Mainly to not_break current users.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-02  4:26                               ` Parav Pandit
@ 2020-09-02  4:44                                 ` Parav Pandit
  2020-09-02  8:00                                 ` Jiri Pirko
  1 sibling, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-02  4:44 UTC (permalink / raw)
  To: Parav Pandit, Jakub Kicinski, Jiri Pirko
  Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko



> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Parav Pandit
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Wednesday, September 2, 2020 2:59 AM
> >
> > On Tue, 1 Sep 2020 11:17:42 +0200 Jiri Pirko wrote:
> > > >> The external PFs need to have an extra attribute with "external
> > > >> enumeration" what would be used for the representor netdev name as
> well.
> > > >>
> > > >> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
> > > >> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf
> > > >> pfnum 0
> > > >> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf
> > > >> extnum 0 pfnum 0
> > > >
> > > >How about a prefix of "ec" instead of "e", like?
> > > >pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf
> > > >ecnum 0 pfnum 0
> > >
> > > Yeah, looks fine to me. Jakub?
> >
> > I don't like that local port doesn't have the controller ID.
> >
> Adding controller ID to local port will change name for all non smartnic
> deployments that affects current vast user base :-(
> 
> > Whether PCI port is external or not is best described by a the peer relation.
> 
> How about adding an attribute something like below in addition to controller id.
> 
> $ devlink port show
> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 ecnum
> 0 external true splitable false
> 
> ^^^^^^^^^^^
> 
I am sorry for messing up the example in previous email.
Please find below examples with controller number and external attribute flag.

$ devlink port show
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 external false splittable false
pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf pfnum 0 ecnum 0 external true splittable false

> > Failing that, at the very least "external" should be a separate
> > attribute/flag from the controller ID.
> >
> Ok. Looks fine to me.
> 
> Jiri?
> 
> > I didn't quite get the fact that you want to not show controller ID on
> > the local port, initially.
> Mainly to not_break current users.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-02  4:26                               ` Parav Pandit
  2020-09-02  4:44                                 ` Parav Pandit
@ 2020-09-02  8:00                                 ` Jiri Pirko
  2020-09-02 15:23                                   ` Jakub Kicinski
  1 sibling, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2020-09-02  8:00 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

Wed, Sep 02, 2020 at 06:26:12AM CEST, parav@nvidia.com wrote:
>
>
>> From: Jakub Kicinski <kuba@kernel.org>
>> Sent: Wednesday, September 2, 2020 2:59 AM
>> 
>> On Tue, 1 Sep 2020 11:17:42 +0200 Jiri Pirko wrote:
>> > >> The external PFs need to have an extra attribute with "external
>> > >> enumeration" what would be used for the representor netdev name as well.
>> > >>
>> > >> pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
>> > >> pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf
>> > >> pfnum 0
>> > >> pci/0000:00:08.0/2: type eth netdev enp0s8f0_e0pf0 flavour pcipf
>> > >> extnum 0 pfnum 0
>> > >
>> > >How about a prefix of "ec" instead of "e", like?
>> > >pci/0000:00:08.0/2: type eth netdev enp0s8f0_ec0pf0 flavour pcipf
>> > >ecnum 0 pfnum 0
>> >
>> > Yeah, looks fine to me. Jakub?
>> 
>> I don't like that local port doesn't have the controller ID.
>> 
>Adding controller ID to local port will change name for all non smartnic deployments that affects current vast user base :-(
>
>> Whether PCI port is external or not is best described by a the peer relation.
>
>How about adding an attribute something like below in addition to controller id.
>
>$ devlink port show
>pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 ecnum 0 external true splitable false
>                                                                                                                                                    ^^^^^^^^^^^
>
>> Failing that, at the very least "external" should be a separate attribute/flag from
>> the controller ID.
>>
>Ok. Looks fine to me.
>
>Jiri?

Yeah, why not.

>
>> I didn't quite get the fact that you want to not show controller ID on the local
>> port, initially.
>Mainly to not_break current users.

You don't have to take it to the name, unless "external" flag is set.

But I don't really see the point of showing !external, cause such
controller number would be always 0. Jakub, why do you think it is
needed?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-02  8:00                                 ` Jiri Pirko
@ 2020-09-02 15:23                                   ` Jakub Kicinski
  2020-09-02 16:18                                     ` Parav Pandit
  2020-09-03  5:54                                     ` Jiri Pirko
  0 siblings, 2 replies; 51+ messages in thread
From: Jakub Kicinski @ 2020-09-02 15:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Parav Pandit, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
>>> I didn't quite get the fact that you want to not show controller ID on the local
>>> port, initially.  
>> Mainly to not_break current users.  
> 
> You don't have to take it to the name, unless "external" flag is set.
> 
> But I don't really see the point of showing !external, cause such
> controller number would be always 0. Jakub, why do you think it is
> needed?

It may seem reasonable for a smartNIC where there are only two
controllers, and all you really need is that external flag. 

In a general case when users are trying to figure out the topology
not knowing which controller they are sitting at looks like a serious
limitation.

Example - multi-host system and you want to know which controller you
are to run power cycle from the BMC side.

We won't be able to change that because it'd change the names for you.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-02 15:23                                   ` Jakub Kicinski
@ 2020-09-02 16:18                                     ` Parav Pandit
  2020-09-02 20:10                                       ` Parav Pandit
  2020-09-03  5:54                                     ` Jiri Pirko
  1 sibling, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-09-02 16:18 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

Hi Jakub,

> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, September 2, 2020 8:54 PM
> 
> On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
> >>> I didn't quite get the fact that you want to not show controller ID
> >>> on the local port, initially.
> >> Mainly to not_break current users.
> >
> > You don't have to take it to the name, unless "external" flag is set.
> >
> > But I don't really see the point of showing !external, cause such
> > controller number would be always 0. Jakub, why do you think it is
> > needed?
> 
> It may seem reasonable for a smartNIC where there are only two controllers,
> and all you really need is that external flag.
> 
> In a general case when users are trying to figure out the topology not
> knowing which controller they are sitting at looks like a serious limitation.
> 
> Example - multi-host system and you want to know which controller you are
> to run power cycle from the BMC side.
> 
> We won't be able to change that because it'd change the names for you.

Is BMC controller running devlink instance?
How the power outlet and devlink instance are connected?
Can you please provide the example to understand the relation?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-02 16:18                                     ` Parav Pandit
@ 2020-09-02 20:10                                       ` Parav Pandit
  0 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-02 20:10 UTC (permalink / raw)
  To: Parav Pandit, Jakub Kicinski, Jiri Pirko
  Cc: Parav Pandit, davem, netdev, Roi Dayan, Saeed Mahameed, Jiri Pirko


> From: Parav Pandit <parav@nvidia.com>
> Sent: Wednesday, September 2, 2020 9:49 PM
> 
> Hi Jakub,
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Wednesday, September 2, 2020 8:54 PM
> >
> > On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
> > >>> I didn't quite get the fact that you want to not show controller
> > >>> ID on the local port, initially.
> > >> Mainly to not_break current users.
> > >
> > > You don't have to take it to the name, unless "external" flag is set.
> > >
> > > But I don't really see the point of showing !external, cause such
> > > controller number would be always 0. Jakub, why do you think it is
> > > needed?
> >
> > It may seem reasonable for a smartNIC where there are only two
> > controllers, and all you really need is that external flag.
> >
With only an external flag how shall we differentiate more than one external controllers?

This is the example of on a single physical smartnic device which is serving two hosts.
Each external host has one controller (c1, c2).
Each controller consist of two PCI PFs. (marked with external = true)
Devlink instance is service the local controller too.

cnum = controller number.
external = true/false describes if its external port or local.

Below naming scheme enables one or more controllers, one or more PFs to be managed by individual devlink instance per controller or one devlink instance for all controller without any ambiguity.
Does this look good?

$ devlink port show
pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 external false
pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true

pci/0000:00:08.1/0: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1 external false
pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true

pci/0000:00:08.2/0: type eth netdev enp0s8f2_pf0 flavour pcipf pfnum 0 external false
pci/0000:00:08.2/1: type eth netdev enp0s8f2_c2pf0 flavour pcipf pfnum 0 cnum 2 external true

pci/0000:00:08.3/0: type eth netdev enp0s8f3_pf1 flavour pcipf pfnum 1 external false
pci/0000:00:08.3/1: type eth netdev enp0s8f3_c2pf1 flavour pcipf pfnum 1 cnum 2 external true

> > In a general case when users are trying to figure out the topology not
> > knowing which controller they are sitting at looks like a serious limitation.
> >
> > Example - multi-host system and you want to know which controller you
> > are to run power cycle from the BMC side.
> >
Did you mean a controller inside the host itself (not in smrtnic) needs to know what is its controller identifier?

> > We won't be able to change that because it'd change the names for you.
> 
> Is BMC controller running devlink instance?
> How the power outlet and devlink instance are connected?
> Can you please provide the example to understand the relation?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-02 15:23                                   ` Jakub Kicinski
  2020-09-02 16:18                                     ` Parav Pandit
@ 2020-09-03  5:54                                     ` Jiri Pirko
  2020-09-03 19:31                                       ` Jakub Kicinski
  1 sibling, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2020-09-03  5:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Parav Pandit, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
>On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
>>>> I didn't quite get the fact that you want to not show controller ID on the local
>>>> port, initially.  
>>> Mainly to not_break current users.  
>> 
>> You don't have to take it to the name, unless "external" flag is set.
>> 
>> But I don't really see the point of showing !external, cause such
>> controller number would be always 0. Jakub, why do you think it is
>> needed?
>
>It may seem reasonable for a smartNIC where there are only two
>controllers, and all you really need is that external flag. 
>
>In a general case when users are trying to figure out the topology
>not knowing which controller they are sitting at looks like a serious
>limitation.

I think we misunderstood each other. I never proposed just "external"
flag. What I propose is either:
1) ecnum attribute absent for local
   ecnum attribute absent set to 0 for external controller X
   ecnum attribute absent set to 1 for external controller Y
   ...

or:
2) ecnum attribute absent for local, external flag set to false
   ecnum attribute absent set to 0 for external controller X, external flag set to true
   ecnum attribute absent set to 1 for external controller Y, external flag set to true

>
>Example - multi-host system and you want to know which controller you
>are to run power cycle from the BMC side.
>
>We won't be able to change that because it'd change the names for you.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-03  5:54                                     ` Jiri Pirko
@ 2020-09-03 19:31                                       ` Jakub Kicinski
  2020-09-04  8:43                                         ` Jiri Pirko
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-09-03 19:31 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Parav Pandit, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Thu, 3 Sep 2020 07:54:39 +0200 Jiri Pirko wrote:
> Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
> >On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:  
> >>>> I didn't quite get the fact that you want to not show controller ID on the local
> >>>> port, initially.    
> >>> Mainly to not_break current users.    
> >> 
> >> You don't have to take it to the name, unless "external" flag is set.
> >> 
> >> But I don't really see the point of showing !external, cause such
> >> controller number would be always 0. Jakub, why do you think it is
> >> needed?  
> >
> >It may seem reasonable for a smartNIC where there are only two
> >controllers, and all you really need is that external flag. 
> >
> >In a general case when users are trying to figure out the topology
> >not knowing which controller they are sitting at looks like a serious
> >limitation.  
> 
> I think we misunderstood each other. I never proposed just "external"
> flag.

Sorry, I was just saying that assuming a single host SmartNIC the
controller ID is not necessary at all. You never suggested that, I did. 
Looks like I just confused everyone with that comment :(

Different controller ID for different PFs but the same PCIe link would
be very wrong. So please clarify - if I have a 2 port smartNIC, with on
PCIe link to the host, and the embedded controller - what would I see?

> What I propose is either:
> 1) ecnum attribute absent for local
>    ecnum attribute absent set to 0 for external controller X
>    ecnum attribute absent set to 1 for external controller Y
>    ...
> 
> or:
> 2) ecnum attribute absent for local, external flag set to false
>    ecnum attribute absent set to 0 for external controller X, external flag set to true
>    ecnum attribute absent set to 1 for external controller Y, external flag set to true

I'm saying that I do want to see the the controller ID for all ports.

So:

3) local:   { "controller ID": x }
   remote1: { "controller ID": y, "external": true }
   remote1: { "controller ID": z, "external": true }

We don't have to put the controller ID in the name for local ports, but
the attribute should be reported. AFAIU name was your main concern, no?

> >Example - multi-host system and you want to know which controller you
> >are to run power cycle from the BMC side.
> >
> >We won't be able to change that because it'd change the names for you.  

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-03 19:31                                       ` Jakub Kicinski
@ 2020-09-04  8:43                                         ` Jiri Pirko
  2020-09-06  3:08                                           ` Parav Pandit
  0 siblings, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2020-09-04  8:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Parav Pandit, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

Thu, Sep 03, 2020 at 09:31:23PM CEST, kuba@kernel.org wrote:
>On Thu, 3 Sep 2020 07:54:39 +0200 Jiri Pirko wrote:
>> Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
>> >On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:  
>> >>>> I didn't quite get the fact that you want to not show controller ID on the local
>> >>>> port, initially.    
>> >>> Mainly to not_break current users.    
>> >> 
>> >> You don't have to take it to the name, unless "external" flag is set.
>> >> 
>> >> But I don't really see the point of showing !external, cause such
>> >> controller number would be always 0. Jakub, why do you think it is
>> >> needed?  
>> >
>> >It may seem reasonable for a smartNIC where there are only two
>> >controllers, and all you really need is that external flag. 
>> >
>> >In a general case when users are trying to figure out the topology
>> >not knowing which controller they are sitting at looks like a serious
>> >limitation.  
>> 
>> I think we misunderstood each other. I never proposed just "external"
>> flag.
>
>Sorry, I was just saying that assuming a single host SmartNIC the
>controller ID is not necessary at all. You never suggested that, I did. 
>Looks like I just confused everyone with that comment :(
>
>Different controller ID for different PFs but the same PCIe link would
>be very wrong. So please clarify - if I have a 2 port smartNIC, with on
>PCIe link to the host, and the embedded controller - what would I see?

Parav?


>
>> What I propose is either:
>> 1) ecnum attribute absent for local
>>    ecnum attribute absent set to 0 for external controller X
>>    ecnum attribute absent set to 1 for external controller Y
>>    ...
>> 
>> or:
>> 2) ecnum attribute absent for local, external flag set to false
>>    ecnum attribute absent set to 0 for external controller X, external flag set to true
>>    ecnum attribute absent set to 1 for external controller Y, external flag set to true
>
>I'm saying that I do want to see the the controller ID for all ports.
>
>So:
>
>3) local:   { "controller ID": x }
>   remote1: { "controller ID": y, "external": true }
>   remote1: { "controller ID": z, "external": true }
>
>We don't have to put the controller ID in the name for local ports, but
>the attribute should be reported. AFAIU name was your main concern, no?

Okay. Sounds fine. Let's put the controller number there for all ports.
ctrlnum X external true
ctrlnum Y external false

if (!external)
	ignore the ctrlnum when generating the name


>
>> >Example - multi-host system and you want to know which controller you
>> >are to run power cycle from the BMC side.
>> >
>> >We won't be able to change that because it'd change the names for you.  

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-04  8:43                                         ` Jiri Pirko
@ 2020-09-06  3:08                                           ` Parav Pandit
  2020-09-06 16:46                                             ` Jakub Kicinski
  2020-09-07  7:21                                             ` Jiri Pirko
  0 siblings, 2 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-06  3:08 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko



> From: Jiri Pirko <jiri@resnulli.us>
> Sent: Friday, September 4, 2020 2:13 PM
> 
> Thu, Sep 03, 2020 at 09:31:23PM CEST, kuba@kernel.org wrote:
> >On Thu, 3 Sep 2020 07:54:39 +0200 Jiri Pirko wrote:
> >> Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
> >> >On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
> >> >>>> I didn't quite get the fact that you want to not show controller ID on the
> local
> >> >>>> port, initially.
> >> >>> Mainly to not_break current users.
> >> >>
> >> >> You don't have to take it to the name, unless "external" flag is set.
> >> >>
> >> >> But I don't really see the point of showing !external, cause such
> >> >> controller number would be always 0. Jakub, why do you think it is
> >> >> needed?
> >> >
> >> >It may seem reasonable for a smartNIC where there are only two
> >> >controllers, and all you really need is that external flag.
> >> >
> >> >In a general case when users are trying to figure out the topology
> >> >not knowing which controller they are sitting at looks like a
> >> >serious limitation.
> >>
> >> I think we misunderstood each other. I never proposed just "external"
> >> flag.
> >
> >Sorry, I was just saying that assuming a single host SmartNIC the
> >controller ID is not necessary at all. You never suggested that, I did.
> >Looks like I just confused everyone with that comment :(
> >
> >Different controller ID for different PFs but the same PCIe link would
> >be very wrong. So please clarify - if I have a 2 port smartNIC, with on
> >PCIe link to the host, and the embedded controller - what would I see?
> 
> Parav?
>
One controller id for both such PFs.
I liked the idea of putting controller number for all the ports but not embedded for local ports.

> 
> >
> >> What I propose is either:
> >> 1) ecnum attribute absent for local
> >>    ecnum attribute absent set to 0 for external controller X
> >>    ecnum attribute absent set to 1 for external controller Y
> >>    ...
> >>
> >> or:
> >> 2) ecnum attribute absent for local, external flag set to false
> >>    ecnum attribute absent set to 0 for external controller X, external flag set
> to true
> >>    ecnum attribute absent set to 1 for external controller Y,
> >> external flag set to true
> >
> >I'm saying that I do want to see the the controller ID for all ports.
> >
> >So:
> >
> >3) local:   { "controller ID": x }
> >   remote1: { "controller ID": y, "external": true }
> >   remote1: { "controller ID": z, "external": true }
> >
> >We don't have to put the controller ID in the name for local ports, but
> >the attribute should be reported. AFAIU name was your main concern, no?
> 
> Okay. Sounds fine. Let's put the controller number there for all ports.
> ctrlnum X external true
> ctrlnum Y external false
> 
> if (!external)
> 	ignore the ctrlnum when generating the name
> 

Putting little more realistic example for Jakub's and your suggestion below.

Below is the output for 3 controllers. ( 2 external + 1 local )
Each external controller consist of 2 PCI PFs for a external host via single PCIe cable.
Each local controller consist of 1 PCI PF.

$ devlink port show
pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 cnum 0 external false
pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true
pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true

Looks ok?

> 
> >
> >> >Example - multi-host system and you want to know which controller
> >> >you are to run power cycle from the BMC side.
> >> >
> >> >We won't be able to change that because it'd change the names for you.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-06  3:08                                           ` Parav Pandit
@ 2020-09-06 16:46                                             ` Jakub Kicinski
  2020-09-07  7:21                                             ` Jiri Pirko
  1 sibling, 0 replies; 51+ messages in thread
From: Jakub Kicinski @ 2020-09-06 16:46 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jiri Pirko, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Sun, 6 Sep 2020 03:08:45 +0000 Parav Pandit wrote:
> > >3) local:   { "controller ID": x }
> > >   remote1: { "controller ID": y, "external": true }
> > >   remote1: { "controller ID": z, "external": true }
> > >
> > >We don't have to put the controller ID in the name for local ports, but
> > >the attribute should be reported. AFAIU name was your main concern, no?  
> > 
> > Okay. Sounds fine. Let's put the controller number there for all ports.
> > ctrlnum X external true
> > ctrlnum Y external false
> > 
> > if (!external)
> > 	ignore the ctrlnum when generating the name
> >   
> 
> Putting little more realistic example for Jakub's and your suggestion below.
> 
> Below is the output for 3 controllers. ( 2 external + 1 local )
> Each external controller consist of 2 PCI PFs for a external host via single PCIe cable.
> Each local controller consist of 1 PCI PF.
> 
> $ devlink port show
> pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 cnum 0 external false
> pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true
> pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true
> 
> Looks ok?

Yup, looks good, thanks.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-06  3:08                                           ` Parav Pandit
  2020-09-06 16:46                                             ` Jakub Kicinski
@ 2020-09-07  7:21                                             ` Jiri Pirko
  2020-09-07 16:18                                               ` Jakub Kicinski
  1 sibling, 1 reply; 51+ messages in thread
From: Jiri Pirko @ 2020-09-07  7:21 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jakub Kicinski, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

Sun, Sep 06, 2020 at 05:08:45AM CEST, parav@nvidia.com wrote:
>
>
>> From: Jiri Pirko <jiri@resnulli.us>
>> Sent: Friday, September 4, 2020 2:13 PM
>> 
>> Thu, Sep 03, 2020 at 09:31:23PM CEST, kuba@kernel.org wrote:
>> >On Thu, 3 Sep 2020 07:54:39 +0200 Jiri Pirko wrote:
>> >> Wed, Sep 02, 2020 at 05:23:58PM CEST, kuba@kernel.org wrote:
>> >> >On Wed, 2 Sep 2020 10:00:11 +0200 Jiri Pirko wrote:
>> >> >>>> I didn't quite get the fact that you want to not show controller ID on the
>> local
>> >> >>>> port, initially.
>> >> >>> Mainly to not_break current users.
>> >> >>
>> >> >> You don't have to take it to the name, unless "external" flag is set.
>> >> >>
>> >> >> But I don't really see the point of showing !external, cause such
>> >> >> controller number would be always 0. Jakub, why do you think it is
>> >> >> needed?
>> >> >
>> >> >It may seem reasonable for a smartNIC where there are only two
>> >> >controllers, and all you really need is that external flag.
>> >> >
>> >> >In a general case when users are trying to figure out the topology
>> >> >not knowing which controller they are sitting at looks like a
>> >> >serious limitation.
>> >>
>> >> I think we misunderstood each other. I never proposed just "external"
>> >> flag.
>> >
>> >Sorry, I was just saying that assuming a single host SmartNIC the
>> >controller ID is not necessary at all. You never suggested that, I did.
>> >Looks like I just confused everyone with that comment :(
>> >
>> >Different controller ID for different PFs but the same PCIe link would
>> >be very wrong. So please clarify - if I have a 2 port smartNIC, with on
>> >PCIe link to the host, and the embedded controller - what would I see?
>> 
>> Parav?
>>
>One controller id for both such PFs.
>I liked the idea of putting controller number for all the ports but not embedded for local ports.
>
>> 
>> >
>> >> What I propose is either:
>> >> 1) ecnum attribute absent for local
>> >>    ecnum attribute absent set to 0 for external controller X
>> >>    ecnum attribute absent set to 1 for external controller Y
>> >>    ...
>> >>
>> >> or:
>> >> 2) ecnum attribute absent for local, external flag set to false
>> >>    ecnum attribute absent set to 0 for external controller X, external flag set
>> to true
>> >>    ecnum attribute absent set to 1 for external controller Y,
>> >> external flag set to true
>> >
>> >I'm saying that I do want to see the the controller ID for all ports.
>> >
>> >So:
>> >
>> >3) local:   { "controller ID": x }
>> >   remote1: { "controller ID": y, "external": true }
>> >   remote1: { "controller ID": z, "external": true }
>> >
>> >We don't have to put the controller ID in the name for local ports, but
>> >the attribute should be reported. AFAIU name was your main concern, no?
>> 
>> Okay. Sounds fine. Let's put the controller number there for all ports.
>> ctrlnum X external true
>> ctrlnum Y external false
>> 
>> if (!external)
>> 	ignore the ctrlnum when generating the name
>> 
>
>Putting little more realistic example for Jakub's and your suggestion below.
>
>Below is the output for 3 controllers. ( 2 external + 1 local )
>Each external controller consist of 2 PCI PFs for a external host via single PCIe cable.
>Each local controller consist of 1 PCI PF.
>
>$ devlink port show
>pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 cnum 0 external false
>pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true
>pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true

I see cnum 0 and cnum 1, yet you talk about 3 controllers. What did I
miss?


>
>Looks ok?
>
>> 
>> >
>> >> >Example - multi-host system and you want to know which controller
>> >> >you are to run power cycle from the BMC side.
>> >> >
>> >> >We won't be able to change that because it'd change the names for you.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name
  2020-09-07  7:21                                             ` Jiri Pirko
@ 2020-09-07 16:18                                               ` Jakub Kicinski
  0 siblings, 0 replies; 51+ messages in thread
From: Jakub Kicinski @ 2020-09-07 16:18 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Parav Pandit, Parav Pandit, davem, netdev, roid, saeedm, Jiri Pirko

On Mon, 7 Sep 2020 09:21:53 +0200 Jiri Pirko wrote:
> >Putting little more realistic example for Jakub's and your suggestion below.
> >
> >Below is the output for 3 controllers. ( 2 external + 1 local )
> >Each external controller consist of 2 PCI PFs for a external host via single PCIe cable.
> >Each local controller consist of 1 PCI PF.
> >
> >$ devlink port show
> >pci/0000:00:08.0/0: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0 cnum 0 external false
> >pci/0000:00:08.0/1: type eth netdev enp0s8f0_c1pf0 flavour pcipf pfnum 0 cnum 1 external true
> >pci/0000:00:08.1/1: type eth netdev enp0s8f1_c1pf1 flavour pcipf pfnum 1 cnum 1 external true  
> 
> I see cnum 0 and cnum 1, yet you talk about 3 controllers. What did I
> miss?

Heh, good point. Please make sure to put this example in docs so folks
have a reference on how we expect a 2-port smartnic to look.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH net-next v2 0/6] devlink show controller number
  2020-08-25 13:58 [PATCH net-next 0/3] devlink show controller info Parav Pandit
                   ` (2 preceding siblings ...)
  2020-08-25 13:58 ` [PATCH net-next 3/3] net/mlx5: E-switch, Set controller attribute for PCI PF and VF ports Parav Pandit
@ 2020-09-08 14:42 ` Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 1/6] net/mlx5: E-switch, Read controller number from device Parav Pandit
                     ` (5 more replies)
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
  4 siblings, 6 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-08 14:42 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit

From: Parav Pandit <parav@nvidia.com>

Hi Jakub, Dave,

Currently a devlink instance that supports an eswitch handles eswitch
ports of two type of controllers.
(1) controller discovered on same system where eswitch resides.
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external system.
This is the case where a controller is plugged in one system and its
devlink eswitch ports are located in a different system. In this case
devlink instance of the eswitch only have access to ports of the
controller.
However, there is no way to describe that a eswitch devlink port
belongs to which controller (mainly which external host controller).
This problem is more prevalent when port attribute such as PF and VF
numbers are overlapping between multiple controllers of same eswitch.
Due to this, for a specific switch_id, unique phys_port_name cannot
be constructed for such devlink ports.

This short series overcomes this limitation by defining two new
attributes.
(a) external: Indicates if port belongs to external controller
(b) controller number: Indicates a controller number of the port

Based on this a unique phys_port_name is prepared using controller
number.

phys_port_name construction using unique controller number is only
applicable to external controller ports. This ensures that for
non smartnic usecases where there is no external controller,
phys_port_name stays same as before.

Patch summary:
Patch-1 Added mlx5 driver to read controller number
Patch-2 Adds the missing comment for the port attributes
Patch-3 Move structure comments away from structure fields
Patch-4 external attribute added for PCI port flavours
Patch-5 Add controller number
Patch-6 Use controller number to build phys_port_name

---
Changelog:
v5->v6:
 - Added mising code for ECPF check; occurred when split the net/mlx5
   patch to individual devlink patches.
v4->v5:
 - Removed controller abstract names 'A' and 'B', instead used
   controller numbers in the commit log description
v3->v4:
 - Split patch for controller number attribute addition and building
   phsy_port_name
 - Removed prefix net/mlx5
v2->v3:
 - Removed controller number setting invokation as it
   is part of different API


Parav Pandit (6):
  net/mlx5: E-switch, Read controller number from device
  devlink: Add comment block for missing port attributes
  devlink: Move structure comments outside of structure
  devlink: Introduce external controller flag
  devlink: Introduce controller number
  devlink: Use controller while building phys_port_name

 .../net/ethernet/mellanox/mlx5/core/en_rep.c  | 13 +++--
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  1 +
 .../mellanox/mlx5/core/eswitch_offloads.c     | 22 +++++++++
 include/net/devlink.h                         | 33 ++++++++++---
 include/uapi/linux/devlink.h                  |  2 +
 net/core/devlink.c                            | 47 +++++++++++++++----
 6 files changed, 99 insertions(+), 19 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH net-next v2 1/6] net/mlx5: E-switch, Read controller number from device
  2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
@ 2020-09-08 14:42   ` Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 2/6] devlink: Add comment block for missing port attributes Parav Pandit
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-08 14:42 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Roi Dayan

From: Parav Pandit <parav@nvidia.com>

ECPF supports one external host controller. Read controller number
from the device.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
---
Changelog:
v1->v2:
 - Removed controller number setting invocation as it
   is part of different API
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  1 +
 .../mellanox/mlx5/core/eswitch_offloads.c     | 22 +++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 867d8120b8a5..7455fbd21a0a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -217,6 +217,7 @@ struct mlx5_esw_offload {
 	atomic64_t num_flows;
 	enum devlink_eswitch_encap_mode encap;
 	struct ida vport_metadata_ida;
+	unsigned int host_number; /* ECPF supports one external host */
 };
 
 /* E-Switch MC FDB table hash node */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index d2516922d867..b381cbca5852 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2110,6 +2110,24 @@ int mlx5_esw_funcs_changed_handler(struct notifier_block *nb, unsigned long type
 	return NOTIFY_OK;
 }
 
+static int mlx5_esw_host_number_init(struct mlx5_eswitch *esw)
+{
+	const u32 *query_host_out;
+
+	if (!mlx5_core_is_ecpf_esw_manager(esw->dev))
+		return 0;
+
+	query_host_out = mlx5_esw_query_functions(esw->dev);
+	if (IS_ERR(query_host_out))
+		return PTR_ERR(query_host_out);
+
+	/* Mark non local controller with non zero controller number. */
+	esw->offloads.host_number = MLX5_GET(query_esw_functions_out, query_host_out,
+					     host_params_context.host_number);
+	kvfree(query_host_out);
+	return 0;
+}
+
 int esw_offloads_enable(struct mlx5_eswitch *esw)
 {
 	struct mlx5_vport *vport;
@@ -2124,6 +2142,10 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
 	mutex_init(&esw->offloads.termtbl_mutex);
 	mlx5_rdma_enable_roce(esw->dev);
 
+	err = mlx5_esw_host_number_init(esw);
+	if (err)
+		goto err_vport_metadata;
+
 	err = esw_set_passing_vport_metadata(esw, true);
 	if (err)
 		goto err_vport_metadata;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v2 2/6] devlink: Add comment block for missing port attributes
  2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 1/6] net/mlx5: E-switch, Read controller number from device Parav Pandit
@ 2020-09-08 14:42   ` Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 3/6] devlink: Move structure comments outside of structure Parav Pandit
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-08 14:42 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko, Roi Dayan

From: Parav Pandit <parav@nvidia.com>

Add comment block for physical, PF and VF port attributes.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
---
 include/net/devlink.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 8f3c8a443238..3c7ba3e1f490 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -73,6 +73,9 @@ struct devlink_port_pci_vf_attrs {
  * @splittable: indicates if the port can be split.
  * @lanes: maximum number of lanes the port supports. 0 value is not passed to netlink.
  * @switch_id: if the port is part of switch, this is buffer with ID, otherwise this is NULL
+ * @phys: physical port attributes
+ * @pci_pf: PCI PF port attributes
+ * @pci_vf: PCI VF port attributes
  */
 struct devlink_port_attrs {
 	u8 split:1,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v2 3/6] devlink: Move structure comments outside of structure
  2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 1/6] net/mlx5: E-switch, Read controller number from device Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 2/6] devlink: Add comment block for missing port attributes Parav Pandit
@ 2020-09-08 14:42   ` Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 4/6] devlink: Introduce external controller flag Parav Pandit
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-08 14:42 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

From: Parav Pandit <parav@nvidia.com>

To add more fields to the PCI PF and VF port attributes, follow standard
structure comment format.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Changelog:
v2->v3:
 - New patch
---
 include/net/devlink.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3c7ba3e1f490..efff9274d248 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -57,13 +57,22 @@ struct devlink_port_phys_attrs {
 	u32 split_subport_number; /* If the port is split, this is the number of subport. */
 };
 
+/**
+ * struct devlink_port_pci_pf_attrs - devlink port's PCI PF attributes
+ * @pf: Associated PCI PF number for this port.
+ */
 struct devlink_port_pci_pf_attrs {
-	u16 pf;	/* Associated PCI PF for this port. */
+	u16 pf;
 };
 
+/**
+ * struct devlink_port_pci_vf_attrs - devlink port's PCI VF attributes
+ * @pf: Associated PCI PF number for this port.
+ * @vf: Associated PCI VF for of the PCI PF for this port.
+ */
 struct devlink_port_pci_vf_attrs {
-	u16 pf;	/* Associated PCI PF for this port. */
-	u16 vf;	/* Associated PCI VF for of the PCI PF for this port. */
+	u16 pf;
+	u16 vf;
 };
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v2 4/6] devlink: Introduce external controller flag
  2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
                     ` (2 preceding siblings ...)
  2020-09-08 14:42   ` [PATCH net-next v2 3/6] devlink: Move structure comments outside of structure Parav Pandit
@ 2020-09-08 14:42   ` Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 5/6] devlink: Introduce controller number Parav Pandit
  2020-09-08 14:42   ` [PATCH net-next v2 6/6] devlink: Use controller while building phys_port_name Parav Pandit
  5 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-08 14:42 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

From: Parav Pandit <parav@nvidia.com>

A devlink eswitch port may represent PCI PF/VF ports of a controller.

A controller either located on same system or it can be an external
controller located in host where such NIC is plugged in.

Add the ability for driver to specify if a port is for external
controller.

Use such flag in the mlx5_core driver.

An example of an external controller having VF1 of PF0 belong to
controller 1.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00
$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Changelog:
v1->v2:
 - Addressed comments from Jiri and Jakub
 - New patch
 - Split 'external' attribute from 'external controller number'.
 - Merged mlx5_core driver to avoid compiliation break
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  6 ++++--
 include/net/devlink.h                            |  8 ++++++--
 include/uapi/linux/devlink.h                     |  1 +
 net/core/devlink.c                               | 12 ++++++++++--
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index e13e5d1b3eae..5b3599caa007 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1215,11 +1215,13 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 	struct devlink_port_attrs attrs = {};
 	struct netdev_phys_item_id ppid = {};
 	unsigned int dl_port_index = 0;
+	bool external;
 	u16 pfnum;
 
 	if (!is_devlink_port_supported(dev, rpriv))
 		return 0;
 
+	external = mlx5_core_is_ecpf_esw_manager(dev);
 	mlx5e_rep_get_port_parent_id(rpriv->netdev, &ppid);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, rep->vport);
 	pfnum = PCI_FUNC(dev->pdev->devfn);
@@ -1232,12 +1234,12 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 	} else if (rep->vport == MLX5_VPORT_PF) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
-		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, pfnum);
+		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, pfnum, external);
 	} else if (mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport)) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
 		devlink_port_attrs_pci_vf_set(&rpriv->dl_port,
-					      pfnum, rep->vport - 1);
+					      pfnum, rep->vport - 1, external);
 	}
 	return devlink_port_register(devlink, &rpriv->dl_port, dl_port_index);
 }
diff --git a/include/net/devlink.h b/include/net/devlink.h
index efff9274d248..2dad8c9151f4 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -60,19 +60,23 @@ struct devlink_port_phys_attrs {
 /**
  * struct devlink_port_pci_pf_attrs - devlink port's PCI PF attributes
  * @pf: Associated PCI PF number for this port.
+ * @external: when set, indicates if a port is for an external controller
  */
 struct devlink_port_pci_pf_attrs {
 	u16 pf;
+	u8 external:1;
 };
 
 /**
  * struct devlink_port_pci_vf_attrs - devlink port's PCI VF attributes
  * @pf: Associated PCI PF number for this port.
  * @vf: Associated PCI VF for of the PCI PF for this port.
+ * @external: when set, indicates if a port is for an external controller
  */
 struct devlink_port_pci_vf_attrs {
 	u16 pf;
 	u16 vf;
+	u8 external:1;
 };
 
 /**
@@ -1215,9 +1219,9 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 void devlink_port_type_clear(struct devlink_port *devlink_port);
 void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    struct devlink_port_attrs *devlink_port_attrs);
-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf);
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bool external);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
-				   u16 pf, u16 vf);
+				   u16 pf, u16 vf, bool external);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cfef4245ea5a..40823ed7e05a 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -458,6 +458,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_LANES,			/* u32 */
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
+	DEVLINK_ATTR_PORT_EXTERNAL,		/* u8 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 49e911c19881..6f5f85372721 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -526,6 +526,8 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
 				attrs->pci_pf.pf))
 			return -EMSGSIZE;
+		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_pf.external))
+			return -EMSGSIZE;
 		break;
 	case DEVLINK_PORT_FLAVOUR_PCI_VF:
 		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
@@ -533,6 +535,8 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
 				attrs->pci_vf.vf))
 			return -EMSGSIZE;
+		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external))
+			return -EMSGSIZE;
 		break;
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 	case DEVLINK_PORT_FLAVOUR_CPU:
@@ -7716,8 +7720,9 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
  *
  *	@devlink_port: devlink port
  *	@pf: associated PF for the devlink port instance
+ *	@external: indicates if the port is for an external controller
  */
-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf)
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bool external)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 	int ret;
@@ -7728,6 +7733,7 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf)
 		return;
 
 	attrs->pci_pf.pf = pf;
+	attrs->pci_pf.external = external;
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
 
@@ -7737,9 +7743,10 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
  *	@devlink_port: devlink port
  *	@pf: associated PF for the devlink port instance
  *	@vf: associated VF of a PF for the devlink port instance
+ *	@external: indicates if the port is for an external controller
  */
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
-				   u16 pf, u16 vf)
+				   u16 pf, u16 vf, bool external)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 	int ret;
@@ -7750,6 +7757,7 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
 		return;
 	attrs->pci_vf.pf = pf;
 	attrs->pci_vf.vf = vf;
+	attrs->pci_vf.external = external;
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v2 5/6] devlink: Introduce controller number
  2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
                     ` (3 preceding siblings ...)
  2020-09-08 14:42   ` [PATCH net-next v2 4/6] devlink: Introduce external controller flag Parav Pandit
@ 2020-09-08 14:42   ` Parav Pandit
  2020-09-08 18:50     ` Jakub Kicinski
  2020-09-08 14:42   ` [PATCH net-next v2 6/6] devlink: Use controller while building phys_port_name Parav Pandit
  5 siblings, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-09-08 14:42 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

From: Parav Pandit <parav@nvidia.com>

A devlink port may be for a controller consist of PCI device.
A devlink instance holds ports of two types of controllers.
(1) controller discovered on same system where eswitch resides
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external host system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system.

When a devlink eswitch instance serves the devlink ports of both
controllers together, PCI PF/VF numbers may overlap.
Due to this a unique phys_port_name cannot be constructed.

For example in below such system controller-A and controller-B, each has
PCI PF pf0 whose eswitch ports are present in controller-A.
These results in phys_port_name as "pf0" for both.
Similar problem exists for VFs and upcoming Sub functions.

An example view of two controller systems:

                -----------------------------------------------------
                |                                                   |
                |           --------- ---------                     |
-------------   |           | vf(s) | | sf(s) |                     |
| server    |   | -------   ----/---- ---/-----  -------            |
| pci rc    |=====| pf0 |______/________/        | pf1 |            |
| connection|   | -------                        -------            |
-------------   |     | controller-B (no eswitch) (controller num=1)|
                ------|----------------------------------------------
                (internal wire)
                      |
                -----------------------------------------------------
                |  devlink eswitch ports and reps                   |
                |  ---------------------------------------------    |
                |  |ctrl-A | ctrl-B | ctrl-A | ctrl-B | ctrl-B |    |
                |  |pf0    | pf0    | pf0vfN | pf0vfN | pf0sfN |    |
                |  ---------------------------------------------    |
                |                                                   |
                |           ---------                               |
                |           | sf(s) |                               |
                | -------   ---/-----    -------                    |
                | | pf0 |_____/          | pf1 |                    |
                | -------                -------                    |
                |                                                   |
                |  local controller-A (eswitch) (controller num=0)  |
                -----------------------------------------------------

An example devlink port for external controller with controller
number = 1 for a VF 1 of PF 0:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Changelog:
v1->v2:
 - Added text diagram of multiple controllers
 - Updated example for a VF
 - Addressed comments from Jiri and Jakub
 - Moved controller number attribute to PCI port flavours
   This enables to better, hirerchical view with controller and its
    PF, VF numbers
 - Split 'external' and 'controller number' attributes as two
   different attributes
 - Merged mlx5_core driver to avoid compiliation break
---
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  9 ++++++--
 include/net/devlink.h                         |  9 ++++++--
 include/uapi/linux/devlink.h                  |  1 +
 net/core/devlink.c                            | 23 +++++++++++--------
 4 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 5b3599caa007..135ee26881c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1210,11 +1210,13 @@ is_devlink_port_supported(const struct mlx5_core_dev *dev,
 static int register_devlink_port(struct mlx5_core_dev *dev,
 				 struct mlx5e_rep_priv *rpriv)
 {
+	struct mlx5_esw_offload *offloads = &dev->priv.eswitch->offloads;
 	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_eswitch_rep *rep = rpriv->rep;
 	struct devlink_port_attrs attrs = {};
 	struct netdev_phys_item_id ppid = {};
 	unsigned int dl_port_index = 0;
+	u32 controller_num = 0;
 	bool external;
 	u16 pfnum;
 
@@ -1222,6 +1224,8 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 		return 0;
 
 	external = mlx5_core_is_ecpf_esw_manager(dev);
+	if (external)
+		controller_num = offloads->host_number + 1;
 	mlx5e_rep_get_port_parent_id(rpriv->netdev, &ppid);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, rep->vport);
 	pfnum = PCI_FUNC(dev->pdev->devfn);
@@ -1234,11 +1238,12 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 	} else if (rep->vport == MLX5_VPORT_PF) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
-		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, pfnum, external);
+		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, controller_num,
+					      pfnum, external);
 	} else if (mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport)) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
-		devlink_port_attrs_pci_vf_set(&rpriv->dl_port,
+		devlink_port_attrs_pci_vf_set(&rpriv->dl_port, controller_num,
 					      pfnum, rep->vport - 1, external);
 	}
 	return devlink_port_register(devlink, &rpriv->dl_port, dl_port_index);
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 2dad8c9151f4..eaec0a8cc5ef 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -59,21 +59,25 @@ struct devlink_port_phys_attrs {
 
 /**
  * struct devlink_port_pci_pf_attrs - devlink port's PCI PF attributes
+ * @controller: Associated controller number
  * @pf: Associated PCI PF number for this port.
  * @external: when set, indicates if a port is for an external controller
  */
 struct devlink_port_pci_pf_attrs {
+	u32 controller;
 	u16 pf;
 	u8 external:1;
 };
 
 /**
  * struct devlink_port_pci_vf_attrs - devlink port's PCI VF attributes
+ * @controller: Associated controller number
  * @pf: Associated PCI PF number for this port.
  * @vf: Associated PCI VF for of the PCI PF for this port.
  * @external: when set, indicates if a port is for an external controller
  */
 struct devlink_port_pci_vf_attrs {
+	u32 controller;
 	u16 pf;
 	u16 vf;
 	u8 external:1;
@@ -1219,8 +1223,9 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 void devlink_port_type_clear(struct devlink_port *devlink_port);
 void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    struct devlink_port_attrs *devlink_port_attrs);
-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bool external);
-void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller,
+				   u16 pf, bool external);
+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 40823ed7e05a..40d35145c879 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -459,6 +459,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
 	DEVLINK_ATTR_PORT_EXTERNAL,		/* u8 */
+	DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,	/* u32 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6f5f85372721..9cf5b118253b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -523,17 +523,18 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		return -EMSGSIZE;
 	switch (devlink_port->attrs.flavour) {
 	case DEVLINK_PORT_FLAVOUR_PCI_PF:
-		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
-				attrs->pci_pf.pf))
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
+				attrs->pci_pf.controller) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_pf.pf))
 			return -EMSGSIZE;
 		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_pf.external))
 			return -EMSGSIZE;
 		break;
 	case DEVLINK_PORT_FLAVOUR_PCI_VF:
-		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
-				attrs->pci_vf.pf) ||
-		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
-				attrs->pci_vf.vf))
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
+				attrs->pci_vf.controller) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_vf.pf) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER, attrs->pci_vf.vf))
 			return -EMSGSIZE;
 		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external))
 			return -EMSGSIZE;
@@ -7719,10 +7720,12 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
  *	devlink_port_attrs_pci_pf_set - Set PCI PF port attributes
  *
  *	@devlink_port: devlink port
+ *	@controller: associated controller number for the devlink port instance
  *	@pf: associated PF for the devlink port instance
  *	@external: indicates if the port is for an external controller
  */
-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bool external)
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller,
+				   u16 pf, bool external)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 	int ret;
@@ -7731,7 +7734,7 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bo
 				       DEVLINK_PORT_FLAVOUR_PCI_PF);
 	if (ret)
 		return;
-
+	attrs->pci_pf.controller = controller;
 	attrs->pci_pf.pf = pf;
 	attrs->pci_pf.external = external;
 }
@@ -7741,11 +7744,12 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
  *	devlink_port_attrs_pci_vf_set - Set PCI VF port attributes
  *
  *	@devlink_port: devlink port
+ *	@controller: associated controller number for the devlink port instance
  *	@pf: associated PF for the devlink port instance
  *	@vf: associated VF of a PF for the devlink port instance
  *	@external: indicates if the port is for an external controller
  */
-void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
@@ -7755,6 +7759,7 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
 				       DEVLINK_PORT_FLAVOUR_PCI_VF);
 	if (ret)
 		return;
+	attrs->pci_vf.controller = controller;
 	attrs->pci_vf.pf = pf;
 	attrs->pci_vf.vf = vf;
 	attrs->pci_vf.external = external;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v2 6/6] devlink: Use controller while building phys_port_name
  2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
                     ` (4 preceding siblings ...)
  2020-09-08 14:42   ` [PATCH net-next v2 5/6] devlink: Introduce controller number Parav Pandit
@ 2020-09-08 14:42   ` Parav Pandit
  5 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-08 14:42 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

From: Parav Pandit <parav@nvidia.com>

Now that controller number attribute is available, use it when
building phsy_port_name for external controller ports.

An example devlink port and representor netdev name consist of controller
annotation for external controller with controller number = 1,
for a VF 1 of PF 0:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Controller number annotation is skipped for non external controllers to
maintain backward compatibility.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Changelog:
v1->v2:
 - New patch
---
 net/core/devlink.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9cf5b118253b..91c12612f2b7 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -7793,9 +7793,23 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 		WARN_ON(1);
 		return -EINVAL;
 	case DEVLINK_PORT_FLAVOUR_PCI_PF:
+		if (attrs->pci_pf.external) {
+			n = snprintf(name, len, "c%u", attrs->pci_pf.controller);
+			if (n >= len)
+				return -EINVAL;
+			len -= n;
+			name += n;
+		}
 		n = snprintf(name, len, "pf%u", attrs->pci_pf.pf);
 		break;
 	case DEVLINK_PORT_FLAVOUR_PCI_VF:
+		if (attrs->pci_vf.external) {
+			n = snprintf(name, len, "c%u", attrs->pci_vf.controller);
+			if (n >= len)
+				return -EINVAL;
+			len -= n;
+			name += n;
+		}
 		n = snprintf(name, len, "pf%uvf%u",
 			     attrs->pci_vf.pf, attrs->pci_vf.vf);
 		break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next v2 5/6] devlink: Introduce controller number
  2020-09-08 14:42   ` [PATCH net-next v2 5/6] devlink: Introduce controller number Parav Pandit
@ 2020-09-08 18:50     ` Jakub Kicinski
  2020-09-09  3:06       ` Parav Pandit
  0 siblings, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-09-08 18:50 UTC (permalink / raw)
  To: Parav Pandit; +Cc: davem, netdev, Parav Pandit, Jiri Pirko

On Tue,  8 Sep 2020 17:42:40 +0300 Parav Pandit wrote:
> A devlink port may be for a controller consist of PCI device.

Humm?

> A devlink instance holds ports of two types of controllers.
> (1) controller discovered on same system where eswitch resides
> This is the case where PCI PF/VF of a controller and devlink eswitch
> instance both are located on a single system.
> (2) controller located on external host system.
> This is the case where a controller is located in one system and its
> devlink eswitch ports are located in a different system.
> 
> When a devlink eswitch instance serves the devlink ports of both
> controllers together, PCI PF/VF numbers may overlap.
> Due to this a unique phys_port_name cannot be constructed.
> 
> For example in below such system controller-A and controller-B, each has
> PCI PF pf0 whose eswitch ports are present in controller-A.
> These results in phys_port_name as "pf0" for both.
> Similar problem exists for VFs and upcoming Sub functions.
> 
> An example view of two controller systems:
> 
>                 -----------------------------------------------------
>                 |                                                   |
>                 |           --------- ---------                     |
> -------------   |           | vf(s) | | sf(s) |                     |
> | server    |   | -------   ----/---- ---/-----  -------            |
> | pci rc    |=====| pf0 |______/________/        | pf1 |            |
> | connection|   | -------                        -------            |
> -------------   |     | controller-B (no eswitch) (controller num=1)|
>                 ------|----------------------------------------------
>                 (internal wire)
>                       |
>                 -----------------------------------------------------
>                 |  devlink eswitch ports and reps                   |
>                 |  ---------------------------------------------    |
>                 |  |ctrl-A | ctrl-B | ctrl-A | ctrl-B | ctrl-B |    |
>                 |  |pf0    | pf0    | pf0vfN | pf0vfN | pf0sfN |    |
>                 |  ---------------------------------------------    |

                                       ^^^^^^^^

ctrl-A doesn't have VFs, but sfs below.

pf1 reprs are not listed.

Perhaps it'd be clearer if controllers where not interleaved?

>                 |                                                   |
>                 |           ---------                               |
>                 |           | sf(s) |                               |
>                 | -------   ---/-----    -------                    |
>                 | | pf0 |_____/          | pf1 |                    |
>                 | -------                -------                    |
>                 |                                                   |
>                 |  local controller-A (eswitch) (controller num=0)  |
>                 -----------------------------------------------------

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH net-next v2 5/6] devlink: Introduce controller number
  2020-09-08 18:50     ` Jakub Kicinski
@ 2020-09-09  3:06       ` Parav Pandit
  0 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-09  3:06 UTC (permalink / raw)
  To: Jakub Kicinski, Parav Pandit; +Cc: davem, netdev, Jiri Pirko



> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, September 9, 2020 12:20 AM
> 
> Humm?
> 
> > A devlink instance holds ports of two types of controllers.
> > (1) controller discovered on same system where eswitch resides This is
> > the case where PCI PF/VF of a controller and devlink eswitch instance
> > both are located on a single system.
> > (2) controller located on external host system.
> > This is the case where a controller is located in one system and its
> > devlink eswitch ports are located in a different system.
> >
> > When a devlink eswitch instance serves the devlink ports of both
> > controllers together, PCI PF/VF numbers may overlap.
> > Due to this a unique phys_port_name cannot be constructed.
> >
> > For example in below such system controller-A and controller-B, each
> > has PCI PF pf0 whose eswitch ports are present in controller-A.
> > These results in phys_port_name as "pf0" for both.
> > Similar problem exists for VFs and upcoming Sub functions.
> >
> > An example view of two controller systems:
> >
> >                 -----------------------------------------------------
> >                 |                                                   |
> >                 |           --------- ---------                     |
> > -------------   |           | vf(s) | | sf(s) |                     |
> > | server    |   | -------   ----/---- ---/-----  -------            |
> > | pci rc    |=====| pf0 |______/________/        | pf1 |            |
> > | connection|   | -------                        -------            |
> > -------------   |     | controller-B (no eswitch) (controller num=1)|
> >                 ------|----------------------------------------------
> >                 (internal wire)
> >                       |
> >                 -----------------------------------------------------
> >                 |  devlink eswitch ports and reps                   |
> >                 |  ---------------------------------------------    |
> >                 |  |ctrl-A | ctrl-B | ctrl-A | ctrl-B | ctrl-B |    |
> >                 |  |pf0    | pf0    | pf0vfN | pf0vfN | pf0sfN |    |
> >                 |  ---------------------------------------------    |
> 
>                                        ^^^^^^^^
> 
> ctrl-A doesn't have VFs, but sfs below.
>
Right. Instead of showing too many overlapping devices in both controllers, picked sf ports.
 
> pf1 reprs are not listed.
>
It was hard to cover replicate same topology as that of pf0, so It is omitted.
I guess I should put that note to avoid this confusion.
 
> Perhaps it'd be clearer if controllers where not interleaved?
Yes, Jiri also pointed out to get rid of naming A and B and use numbers.
Little older diagram got it. :-(

> 
> >                 |                                                   |
> >                 |           ---------                               |
> >                 |           | sf(s) |                               |
> >                 | -------   ---/-----    -------                    |
> >                 | | pf0 |_____/          | pf1 |                    |
> >                 | -------                -------                    |
> >                 |                                                   |
> >                 |  local controller-A (eswitch) (controller num=0)  |
> >                 -----------------------------------------------------

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH net-next v3 0/6] devlink show controller number
  2020-08-25 13:58 [PATCH net-next 0/3] devlink show controller info Parav Pandit
                   ` (3 preceding siblings ...)
  2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
@ 2020-09-09  4:50 ` Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 1/6] net/mlx5: E-switch, Read controller number from device Parav Pandit
                     ` (6 more replies)
  4 siblings, 7 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-09  4:50 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit

From: Parav Pandit <parav@nvidia.com>

Hi Jakub, Dave,

Currently a devlink instance that supports an eswitch handles eswitch
ports of two type of controllers.
(1) controller discovered on same system where eswitch resides.
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external system.
This is the case where a controller is plugged in one system and its
devlink eswitch ports are located in a different system. In this case
devlink instance of the eswitch only have access to ports of the
controller.
However, there is no way to describe that a eswitch devlink port
belongs to which controller (mainly which external host controller).
This problem is more prevalent when port attribute such as PF and VF
numbers are overlapping between multiple controllers of same eswitch.
Due to this, for a specific switch_id, unique phys_port_name cannot
be constructed for such devlink ports.

This short series overcomes this limitation by defining two new
attributes.
(a) external: Indicates if port belongs to external controller
(b) controller number: Indicates a controller number of the port

Based on this a unique phys_port_name is prepared using controller
number.

phys_port_name construction using unique controller number is only
applicable to external controller ports. This ensures that for
non smartnic usecases where there is no external controller,
phys_port_name stays same as before.

Patch summary:
Patch-1 Added mlx5 driver to read controller number
Patch-2 Adds the missing comment for the port attributes
Patch-3 Move structure comments away from structure fields
Patch-4 external attribute added for PCI port flavours
Patch-5 Add controller number
Patch-6 Use controller number to build phys_port_name

---
Changelog:
v2->v3:
 - Updated diagram to get rid of controller 'A' and 'B'
 - Kept ports of single controller together in diagram
 - Updated diagram for pf1's VF and SF and its ports
v1->v2:
 - Added text diagram of multiple controllers
 - Updated example for a VF
 - Addressed comments from Jiri and Jakub
 - Moved controller number attribute to PCI port flavours
   This enables to better, hirerchical view with controller and its
    PF, VF numbers
 - Split 'external' and 'controller number' attributes as two
   different attributes
 - Merged mlx5_core driver to avoid compiliation break

Parav Pandit (6):
  net/mlx5: E-switch, Read controller number from device
  devlink: Add comment block for missing port attributes
  devlink: Move structure comments outside of structure
  devlink: Introduce external controller flag
  devlink: Introduce controller number
  devlink: Use controller while building phys_port_name

 .../net/ethernet/mellanox/mlx5/core/en_rep.c  | 13 +++--
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  1 +
 .../mellanox/mlx5/core/eswitch_offloads.c     | 22 +++++++++
 include/net/devlink.h                         | 33 ++++++++++---
 include/uapi/linux/devlink.h                  |  2 +
 net/core/devlink.c                            | 47 +++++++++++++++----
 6 files changed, 99 insertions(+), 19 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH net-next v3 1/6] net/mlx5: E-switch, Read controller number from device
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
@ 2020-09-09  4:50   ` Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 2/6] devlink: Add comment block for missing port attributes Parav Pandit
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-09  4:50 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Roi Dayan

From: Parav Pandit <parav@nvidia.com>

ECPF supports one external host controller. Read controller number
from the device.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
---
Changelog:
v1->v2:
 - Removed controller number setting invocation as it
   is part of different API
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  1 +
 .../mellanox/mlx5/core/eswitch_offloads.c     | 22 +++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 867d8120b8a5..7455fbd21a0a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -217,6 +217,7 @@ struct mlx5_esw_offload {
 	atomic64_t num_flows;
 	enum devlink_eswitch_encap_mode encap;
 	struct ida vport_metadata_ida;
+	unsigned int host_number; /* ECPF supports one external host */
 };
 
 /* E-Switch MC FDB table hash node */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index d2516922d867..b381cbca5852 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2110,6 +2110,24 @@ int mlx5_esw_funcs_changed_handler(struct notifier_block *nb, unsigned long type
 	return NOTIFY_OK;
 }
 
+static int mlx5_esw_host_number_init(struct mlx5_eswitch *esw)
+{
+	const u32 *query_host_out;
+
+	if (!mlx5_core_is_ecpf_esw_manager(esw->dev))
+		return 0;
+
+	query_host_out = mlx5_esw_query_functions(esw->dev);
+	if (IS_ERR(query_host_out))
+		return PTR_ERR(query_host_out);
+
+	/* Mark non local controller with non zero controller number. */
+	esw->offloads.host_number = MLX5_GET(query_esw_functions_out, query_host_out,
+					     host_params_context.host_number);
+	kvfree(query_host_out);
+	return 0;
+}
+
 int esw_offloads_enable(struct mlx5_eswitch *esw)
 {
 	struct mlx5_vport *vport;
@@ -2124,6 +2142,10 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
 	mutex_init(&esw->offloads.termtbl_mutex);
 	mlx5_rdma_enable_roce(esw->dev);
 
+	err = mlx5_esw_host_number_init(esw);
+	if (err)
+		goto err_vport_metadata;
+
 	err = esw_set_passing_vport_metadata(esw, true);
 	if (err)
 		goto err_vport_metadata;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v3 2/6] devlink: Add comment block for missing port attributes
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 1/6] net/mlx5: E-switch, Read controller number from device Parav Pandit
@ 2020-09-09  4:50   ` Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 3/6] devlink: Move structure comments outside of structure Parav Pandit
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-09  4:50 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko, Roi Dayan

From: Parav Pandit <parav@nvidia.com>

Add comment block for physical, PF and VF port attributes.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
---
 include/net/devlink.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 8f3c8a443238..3c7ba3e1f490 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -73,6 +73,9 @@ struct devlink_port_pci_vf_attrs {
  * @splittable: indicates if the port can be split.
  * @lanes: maximum number of lanes the port supports. 0 value is not passed to netlink.
  * @switch_id: if the port is part of switch, this is buffer with ID, otherwise this is NULL
+ * @phys: physical port attributes
+ * @pci_pf: PCI PF port attributes
+ * @pci_vf: PCI VF port attributes
  */
 struct devlink_port_attrs {
 	u8 split:1,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v3 3/6] devlink: Move structure comments outside of structure
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 1/6] net/mlx5: E-switch, Read controller number from device Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 2/6] devlink: Add comment block for missing port attributes Parav Pandit
@ 2020-09-09  4:50   ` Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 4/6] devlink: Introduce external controller flag Parav Pandit
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-09  4:50 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

From: Parav Pandit <parav@nvidia.com>

To add more fields to the PCI PF and VF port attributes, follow standard
structure comment format.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Changelog:
v2->v3:
 - New patch
---
 include/net/devlink.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3c7ba3e1f490..efff9274d248 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -57,13 +57,22 @@ struct devlink_port_phys_attrs {
 	u32 split_subport_number; /* If the port is split, this is the number of subport. */
 };
 
+/**
+ * struct devlink_port_pci_pf_attrs - devlink port's PCI PF attributes
+ * @pf: Associated PCI PF number for this port.
+ */
 struct devlink_port_pci_pf_attrs {
-	u16 pf;	/* Associated PCI PF for this port. */
+	u16 pf;
 };
 
+/**
+ * struct devlink_port_pci_vf_attrs - devlink port's PCI VF attributes
+ * @pf: Associated PCI PF number for this port.
+ * @vf: Associated PCI VF for of the PCI PF for this port.
+ */
 struct devlink_port_pci_vf_attrs {
-	u16 pf;	/* Associated PCI PF for this port. */
-	u16 vf;	/* Associated PCI VF for of the PCI PF for this port. */
+	u16 pf;
+	u16 vf;
 };
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v3 4/6] devlink: Introduce external controller flag
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
                     ` (2 preceding siblings ...)
  2020-09-09  4:50   ` [PATCH net-next v3 3/6] devlink: Move structure comments outside of structure Parav Pandit
@ 2020-09-09  4:50   ` Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 5/6] devlink: Introduce controller number Parav Pandit
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-09  4:50 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

From: Parav Pandit <parav@nvidia.com>

A devlink eswitch port may represent PCI PF/VF ports of a controller.

A controller either located on same system or it can be an external
controller located in host where such NIC is plugged in.

Add the ability for driver to specify if a port is for external
controller.

Use such flag in the mlx5_core driver.

An example of an external controller having VF1 of PF0 belong to
controller 1.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00
$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Changelog:
v1->v2:
 - Addressed comments from Jiri and Jakub
 - New patch
 - Split 'external' attribute from 'external controller number'.
 - Merged mlx5_core driver to avoid compiliation break
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  6 ++++--
 include/net/devlink.h                            |  8 ++++++--
 include/uapi/linux/devlink.h                     |  1 +
 net/core/devlink.c                               | 12 ++++++++++--
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index e13e5d1b3eae..5b3599caa007 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1215,11 +1215,13 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 	struct devlink_port_attrs attrs = {};
 	struct netdev_phys_item_id ppid = {};
 	unsigned int dl_port_index = 0;
+	bool external;
 	u16 pfnum;
 
 	if (!is_devlink_port_supported(dev, rpriv))
 		return 0;
 
+	external = mlx5_core_is_ecpf_esw_manager(dev);
 	mlx5e_rep_get_port_parent_id(rpriv->netdev, &ppid);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, rep->vport);
 	pfnum = PCI_FUNC(dev->pdev->devfn);
@@ -1232,12 +1234,12 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 	} else if (rep->vport == MLX5_VPORT_PF) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
-		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, pfnum);
+		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, pfnum, external);
 	} else if (mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport)) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
 		devlink_port_attrs_pci_vf_set(&rpriv->dl_port,
-					      pfnum, rep->vport - 1);
+					      pfnum, rep->vport - 1, external);
 	}
 	return devlink_port_register(devlink, &rpriv->dl_port, dl_port_index);
 }
diff --git a/include/net/devlink.h b/include/net/devlink.h
index efff9274d248..2dad8c9151f4 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -60,19 +60,23 @@ struct devlink_port_phys_attrs {
 /**
  * struct devlink_port_pci_pf_attrs - devlink port's PCI PF attributes
  * @pf: Associated PCI PF number for this port.
+ * @external: when set, indicates if a port is for an external controller
  */
 struct devlink_port_pci_pf_attrs {
 	u16 pf;
+	u8 external:1;
 };
 
 /**
  * struct devlink_port_pci_vf_attrs - devlink port's PCI VF attributes
  * @pf: Associated PCI PF number for this port.
  * @vf: Associated PCI VF for of the PCI PF for this port.
+ * @external: when set, indicates if a port is for an external controller
  */
 struct devlink_port_pci_vf_attrs {
 	u16 pf;
 	u16 vf;
+	u8 external:1;
 };
 
 /**
@@ -1215,9 +1219,9 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 void devlink_port_type_clear(struct devlink_port *devlink_port);
 void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    struct devlink_port_attrs *devlink_port_attrs);
-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf);
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bool external);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
-				   u16 pf, u16 vf);
+				   u16 pf, u16 vf, bool external);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index cfef4245ea5a..40823ed7e05a 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -458,6 +458,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_LANES,			/* u32 */
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
+	DEVLINK_ATTR_PORT_EXTERNAL,		/* u8 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 49e911c19881..6f5f85372721 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -526,6 +526,8 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
 				attrs->pci_pf.pf))
 			return -EMSGSIZE;
+		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_pf.external))
+			return -EMSGSIZE;
 		break;
 	case DEVLINK_PORT_FLAVOUR_PCI_VF:
 		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
@@ -533,6 +535,8 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
 				attrs->pci_vf.vf))
 			return -EMSGSIZE;
+		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external))
+			return -EMSGSIZE;
 		break;
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 	case DEVLINK_PORT_FLAVOUR_CPU:
@@ -7716,8 +7720,9 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
  *
  *	@devlink_port: devlink port
  *	@pf: associated PF for the devlink port instance
+ *	@external: indicates if the port is for an external controller
  */
-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf)
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bool external)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 	int ret;
@@ -7728,6 +7733,7 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf)
 		return;
 
 	attrs->pci_pf.pf = pf;
+	attrs->pci_pf.external = external;
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
 
@@ -7737,9 +7743,10 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
  *	@devlink_port: devlink port
  *	@pf: associated PF for the devlink port instance
  *	@vf: associated VF of a PF for the devlink port instance
+ *	@external: indicates if the port is for an external controller
  */
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
-				   u16 pf, u16 vf)
+				   u16 pf, u16 vf, bool external)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 	int ret;
@@ -7750,6 +7757,7 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
 		return;
 	attrs->pci_vf.pf = pf;
 	attrs->pci_vf.vf = vf;
+	attrs->pci_vf.external = external;
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v3 5/6] devlink: Introduce controller number
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
                     ` (3 preceding siblings ...)
  2020-09-09  4:50   ` [PATCH net-next v3 4/6] devlink: Introduce external controller flag Parav Pandit
@ 2020-09-09  4:50   ` Parav Pandit
  2020-09-09  4:50   ` [PATCH net-next v3 6/6] devlink: Use controller while building phys_port_name Parav Pandit
  2020-09-09 15:34   ` [PATCH net-next v3 0/6] devlink show controller number Jakub Kicinski
  6 siblings, 0 replies; 51+ messages in thread
From: Parav Pandit @ 2020-09-09  4:50 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

From: Parav Pandit <parav@nvidia.com>

A devlink port may be for a controller consist of PCI device.
A devlink instance holds ports of two types of controllers.
(1) controller discovered on same system where eswitch resides
This is the case where PCI PF/VF of a controller and devlink eswitch
instance both are located on a single system.
(2) controller located on external host system.
This is the case where a controller is located in one system and its
devlink eswitch ports are located in a different system.

When a devlink eswitch instance serves the devlink ports of both
controllers together, PCI PF/VF numbers may overlap.
Due to this a unique phys_port_name cannot be constructed.

For example in below such system controller-0 and controller-1, each has
PCI PF pf0 whose eswitch ports can be present in controller-0.
These results in phys_port_name as "pf0" for both.
Similar problem exists for VFs and upcoming Sub functions.

An example view of two controller systems:

             ---------------------------------------------------------
             |                                                       |
             |           --------- ---------         ------- ------- |
-----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
| server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
| pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
| connect |  | -------                       -------                 |
-----------  |     | controller_num=1 (no eswitch)                   |
             ------|--------------------------------------------------
             (internal wire)
                   |
             ---------------------------------------------------------
             | devlink eswitch ports and reps                        |
             | ----------------------------------------------------- |
             | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
             | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
             | ----------------------------------------------------- |
             | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
             | |pf1    | pf1vfN | pf1sfN | pf1    | pf1vfN |pf0sfN | |
             | ----------------------------------------------------- |
             |                                                       |
             |                                                       |
             |           --------- ---------         ------- ------- |
             |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
             | -------   ----/---- ---/----- ------- ---/--- ---/--- |
             | | pf0 |______/________/       | pf1 |___/_______/     |
             | -------                       -------                 |
             |                                                       |
             |  local controller_num=0 (eswitch)                     |
             ---------------------------------------------------------

An example devlink port for external controller with controller
number = 1 for a VF 1 of PF 0:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Changelog:
v2->v3:
 - Updated diagram to get rid of controller 'A' and 'B'
 - Kept ports of single controller together in diagram
 - Updated diagram for pf1's VF and SF and its ports
v1->v2:
 - Added text diagram of multiple controllers
 - Updated example for a VF
 - Addressed comments from Jiri and Jakub
 - Moved controller number attribute to PCI port flavours
   This enables to better, hirerchical view with controller and its
    PF, VF numbers
 - Split 'external' and 'controller number' attributes as two
   different attributes
 - Merged mlx5_core driver to avoid compiliation break
---
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |  9 ++++++--
 include/net/devlink.h                         |  9 ++++++--
 include/uapi/linux/devlink.h                  |  1 +
 net/core/devlink.c                            | 23 +++++++++++--------
 4 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 5b3599caa007..135ee26881c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1210,11 +1210,13 @@ is_devlink_port_supported(const struct mlx5_core_dev *dev,
 static int register_devlink_port(struct mlx5_core_dev *dev,
 				 struct mlx5e_rep_priv *rpriv)
 {
+	struct mlx5_esw_offload *offloads = &dev->priv.eswitch->offloads;
 	struct devlink *devlink = priv_to_devlink(dev);
 	struct mlx5_eswitch_rep *rep = rpriv->rep;
 	struct devlink_port_attrs attrs = {};
 	struct netdev_phys_item_id ppid = {};
 	unsigned int dl_port_index = 0;
+	u32 controller_num = 0;
 	bool external;
 	u16 pfnum;
 
@@ -1222,6 +1224,8 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 		return 0;
 
 	external = mlx5_core_is_ecpf_esw_manager(dev);
+	if (external)
+		controller_num = offloads->host_number + 1;
 	mlx5e_rep_get_port_parent_id(rpriv->netdev, &ppid);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, rep->vport);
 	pfnum = PCI_FUNC(dev->pdev->devfn);
@@ -1234,11 +1238,12 @@ static int register_devlink_port(struct mlx5_core_dev *dev,
 	} else if (rep->vport == MLX5_VPORT_PF) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
-		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, pfnum, external);
+		devlink_port_attrs_pci_pf_set(&rpriv->dl_port, controller_num,
+					      pfnum, external);
 	} else if (mlx5_eswitch_is_vf_vport(dev->priv.eswitch, rpriv->rep->vport)) {
 		memcpy(rpriv->dl_port.attrs.switch_id.id, &ppid.id[0], ppid.id_len);
 		rpriv->dl_port.attrs.switch_id.id_len = ppid.id_len;
-		devlink_port_attrs_pci_vf_set(&rpriv->dl_port,
+		devlink_port_attrs_pci_vf_set(&rpriv->dl_port, controller_num,
 					      pfnum, rep->vport - 1, external);
 	}
 	return devlink_port_register(devlink, &rpriv->dl_port, dl_port_index);
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 2dad8c9151f4..eaec0a8cc5ef 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -59,21 +59,25 @@ struct devlink_port_phys_attrs {
 
 /**
  * struct devlink_port_pci_pf_attrs - devlink port's PCI PF attributes
+ * @controller: Associated controller number
  * @pf: Associated PCI PF number for this port.
  * @external: when set, indicates if a port is for an external controller
  */
 struct devlink_port_pci_pf_attrs {
+	u32 controller;
 	u16 pf;
 	u8 external:1;
 };
 
 /**
  * struct devlink_port_pci_vf_attrs - devlink port's PCI VF attributes
+ * @controller: Associated controller number
  * @pf: Associated PCI PF number for this port.
  * @vf: Associated PCI VF for of the PCI PF for this port.
  * @external: when set, indicates if a port is for an external controller
  */
 struct devlink_port_pci_vf_attrs {
+	u32 controller;
 	u16 pf;
 	u16 vf;
 	u8 external:1;
@@ -1219,8 +1223,9 @@ void devlink_port_type_ib_set(struct devlink_port *devlink_port,
 void devlink_port_type_clear(struct devlink_port *devlink_port);
 void devlink_port_attrs_set(struct devlink_port *devlink_port,
 			    struct devlink_port_attrs *devlink_port_attrs);
-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bool external);
-void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller,
+				   u16 pf, bool external);
+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 40823ed7e05a..40d35145c879 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -459,6 +459,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_SPLITTABLE,			/* u8 */
 
 	DEVLINK_ATTR_PORT_EXTERNAL,		/* u8 */
+	DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,	/* u32 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6f5f85372721..9cf5b118253b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -523,17 +523,18 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		return -EMSGSIZE;
 	switch (devlink_port->attrs.flavour) {
 	case DEVLINK_PORT_FLAVOUR_PCI_PF:
-		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
-				attrs->pci_pf.pf))
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
+				attrs->pci_pf.controller) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_pf.pf))
 			return -EMSGSIZE;
 		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_pf.external))
 			return -EMSGSIZE;
 		break;
 	case DEVLINK_PORT_FLAVOUR_PCI_VF:
-		if (nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER,
-				attrs->pci_vf.pf) ||
-		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER,
-				attrs->pci_vf.vf))
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
+				attrs->pci_vf.controller) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_vf.pf) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_VF_NUMBER, attrs->pci_vf.vf))
 			return -EMSGSIZE;
 		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external))
 			return -EMSGSIZE;
@@ -7719,10 +7720,12 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_set);
  *	devlink_port_attrs_pci_pf_set - Set PCI PF port attributes
  *
  *	@devlink_port: devlink port
+ *	@controller: associated controller number for the devlink port instance
  *	@pf: associated PF for the devlink port instance
  *	@external: indicates if the port is for an external controller
  */
-void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bool external)
+void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 controller,
+				   u16 pf, bool external)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
 	int ret;
@@ -7731,7 +7734,7 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u16 pf, bo
 				       DEVLINK_PORT_FLAVOUR_PCI_PF);
 	if (ret)
 		return;
-
+	attrs->pci_pf.controller = controller;
 	attrs->pci_pf.pf = pf;
 	attrs->pci_pf.external = external;
 }
@@ -7741,11 +7744,12 @@ EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_pf_set);
  *	devlink_port_attrs_pci_vf_set - Set PCI VF port attributes
  *
  *	@devlink_port: devlink port
+ *	@controller: associated controller number for the devlink port instance
  *	@pf: associated PF for the devlink port instance
  *	@vf: associated VF of a PF for the devlink port instance
  *	@external: indicates if the port is for an external controller
  */
-void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
+void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external)
 {
 	struct devlink_port_attrs *attrs = &devlink_port->attrs;
@@ -7755,6 +7759,7 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port,
 				       DEVLINK_PORT_FLAVOUR_PCI_VF);
 	if (ret)
 		return;
+	attrs->pci_vf.controller = controller;
 	attrs->pci_vf.pf = pf;
 	attrs->pci_vf.vf = vf;
 	attrs->pci_vf.external = external;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next v3 6/6] devlink: Use controller while building phys_port_name
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
                     ` (4 preceding siblings ...)
  2020-09-09  4:50   ` [PATCH net-next v3 5/6] devlink: Introduce controller number Parav Pandit
@ 2020-09-09  4:50   ` Parav Pandit
  2020-09-10 15:02     ` David Ahern
  2020-09-09 15:34   ` [PATCH net-next v3 0/6] devlink show controller number Jakub Kicinski
  6 siblings, 1 reply; 51+ messages in thread
From: Parav Pandit @ 2020-09-09  4:50 UTC (permalink / raw)
  To: kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

From: Parav Pandit <parav@nvidia.com>

Now that controller number attribute is available, use it when
building phsy_port_name for external controller ports.

An example devlink port and representor netdev name consist of controller
annotation for external controller with controller number = 1,
for a VF 1 of PF 0:

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "ens2f0c1pf0vf1",
            "flavour": "pcivf",
            "controller": 1,
            "pfnum": 0,
            "vfnum": 1,
            "external": true,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00"
            }
        }
    }
}

Controller number annotation is skipped for non external controllers to
maintain backward compatibility.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Changelog:
v1->v2:
 - New patch
---
 net/core/devlink.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9cf5b118253b..91c12612f2b7 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -7793,9 +7793,23 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 		WARN_ON(1);
 		return -EINVAL;
 	case DEVLINK_PORT_FLAVOUR_PCI_PF:
+		if (attrs->pci_pf.external) {
+			n = snprintf(name, len, "c%u", attrs->pci_pf.controller);
+			if (n >= len)
+				return -EINVAL;
+			len -= n;
+			name += n;
+		}
 		n = snprintf(name, len, "pf%u", attrs->pci_pf.pf);
 		break;
 	case DEVLINK_PORT_FLAVOUR_PCI_VF:
+		if (attrs->pci_vf.external) {
+			n = snprintf(name, len, "c%u", attrs->pci_vf.controller);
+			if (n >= len)
+				return -EINVAL;
+			len -= n;
+			name += n;
+		}
 		n = snprintf(name, len, "pf%uvf%u",
 			     attrs->pci_vf.pf, attrs->pci_vf.vf);
 		break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next v3 0/6] devlink show controller number
  2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
                     ` (5 preceding siblings ...)
  2020-09-09  4:50   ` [PATCH net-next v3 6/6] devlink: Use controller while building phys_port_name Parav Pandit
@ 2020-09-09 15:34   ` Jakub Kicinski
  2020-09-09 21:20     ` David Miller
  6 siblings, 1 reply; 51+ messages in thread
From: Jakub Kicinski @ 2020-09-09 15:34 UTC (permalink / raw)
  To: Parav Pandit; +Cc: davem, netdev, Parav Pandit

On Wed,  9 Sep 2020 07:50:32 +0300 Parav Pandit wrote:
> From: Parav Pandit <parav@nvidia.com>
> 
> Hi Jakub, Dave,
> 
> Currently a devlink instance that supports an eswitch handles eswitch
> ports of two type of controllers.
> (1) controller discovered on same system where eswitch resides.
> This is the case where PCI PF/VF of a controller and devlink eswitch
> instance both are located on a single system.
> (2) controller located on external system.
> This is the case where a controller is plugged in one system and its
> devlink eswitch ports are located in a different system. In this case
> devlink instance of the eswitch only have access to ports of the
> controller.
> However, there is no way to describe that a eswitch devlink port
> belongs to which controller (mainly which external host controller).
> This problem is more prevalent when port attribute such as PF and VF
> numbers are overlapping between multiple controllers of same eswitch.
> Due to this, for a specific switch_id, unique phys_port_name cannot
> be constructed for such devlink ports.

Acked-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next v3 0/6] devlink show controller number
  2020-09-09 15:34   ` [PATCH net-next v3 0/6] devlink show controller number Jakub Kicinski
@ 2020-09-09 21:20     ` David Miller
  0 siblings, 0 replies; 51+ messages in thread
From: David Miller @ 2020-09-09 21:20 UTC (permalink / raw)
  To: kuba; +Cc: parav, netdev, parav

From: Jakub Kicinski <kuba@kernel.org>
Date: Wed, 9 Sep 2020 08:34:42 -0700

> On Wed,  9 Sep 2020 07:50:32 +0300 Parav Pandit wrote:
>> From: Parav Pandit <parav@nvidia.com>
>> 
>> Hi Jakub, Dave,
>> 
>> Currently a devlink instance that supports an eswitch handles eswitch
>> ports of two type of controllers.
>> (1) controller discovered on same system where eswitch resides.
>> This is the case where PCI PF/VF of a controller and devlink eswitch
>> instance both are located on a single system.
>> (2) controller located on external system.
>> This is the case where a controller is plugged in one system and its
>> devlink eswitch ports are located in a different system. In this case
>> devlink instance of the eswitch only have access to ports of the
>> controller.
>> However, there is no way to describe that a eswitch devlink port
>> belongs to which controller (mainly which external host controller).
>> This problem is more prevalent when port attribute such as PF and VF
>> numbers are overlapping between multiple controllers of same eswitch.
>> Due to this, for a specific switch_id, unique phys_port_name cannot
>> be constructed for such devlink ports.
> 
> Acked-by: Jakub Kicinski <kuba@kernel.org>

Series applied, thanks everyone.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next v3 6/6] devlink: Use controller while building phys_port_name
  2020-09-09  4:50   ` [PATCH net-next v3 6/6] devlink: Use controller while building phys_port_name Parav Pandit
@ 2020-09-10 15:02     ` David Ahern
  0 siblings, 0 replies; 51+ messages in thread
From: David Ahern @ 2020-09-10 15:02 UTC (permalink / raw)
  To: Parav Pandit, kuba, davem, netdev; +Cc: Parav Pandit, Jiri Pirko

On 9/8/20 10:50 PM, Parav Pandit wrote:
> $ devlink port show pci/0000:06:00.0/2
> pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
>   function:
>     hw_addr 00:00:00:00:00:00
> 
> $ devlink port show pci/0000:06:00.0/2 -jp
> {
>     "port": {
>         "pci/0000:06:00.0/2": {
>             "type": "eth",
>             "netdev": "ens2f0c1pf0vf1",

That strlen is 14 chars. Any 2 ids go to a second digit and you overrrun
the IFNAMSZ which means ...

> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 9cf5b118253b..91c12612f2b7 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -7793,9 +7793,23 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
>  		WARN_ON(1);
>  		return -EINVAL;
>  	case DEVLINK_PORT_FLAVOUR_PCI_PF:
> +		if (attrs->pci_pf.external) {
> +			n = snprintf(name, len, "c%u", attrs->pci_pf.controller);
> +			if (n >= len)
> +				return -EINVAL;

...  this function returns EINVAL which is going to be confusing to users.

> +			len -= n;
> +			name += n;
> +		}
>  		n = snprintf(name, len, "pf%u", attrs->pci_pf.pf);
>  		break;
>  	case DEVLINK_PORT_FLAVOUR_PCI_VF:
> +		if (attrs->pci_vf.external) {
> +			n = snprintf(name, len, "c%u", attrs->pci_vf.controller);
> +			if (n >= len)
> +				return -EINVAL;
> +			len -= n;
> +			name += n;
> +		}
>  		n = snprintf(name, len, "pf%uvf%u",
>  			     attrs->pci_vf.pf, attrs->pci_vf.vf);
>  		break;
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2020-09-10 19:58 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25 13:58 [PATCH net-next 0/3] devlink show controller info Parav Pandit
2020-08-25 13:58 ` [PATCH net-next 1/3] devlink: Add comment block for missing port attributes Parav Pandit
2020-08-25 13:58 ` [PATCH net-next 2/3] devlink: Consider other controller while building phys_port_name Parav Pandit
2020-08-26  0:32   ` Jakub Kicinski
2020-08-26  4:27     ` Parav Pandit
2020-08-26 20:07       ` Jakub Kicinski
2020-08-27  4:31         ` Parav Pandit
2020-08-27 18:32           ` Jakub Kicinski
2020-08-27 20:15             ` Parav Pandit
2020-08-27 21:42               ` Jakub Kicinski
2020-08-28  4:27                 ` Parav Pandit
2020-08-28  5:08                   ` Parav Pandit
2020-08-28 16:43                   ` Jakub Kicinski
2020-08-29  3:43                     ` Parav Pandit
2020-09-01  8:19                       ` Jiri Pirko
2020-09-01  8:53                         ` Parav Pandit
2020-09-01  9:17                           ` Jiri Pirko
2020-09-01 21:28                             ` Jakub Kicinski
2020-09-02  4:26                               ` Parav Pandit
2020-09-02  4:44                                 ` Parav Pandit
2020-09-02  8:00                                 ` Jiri Pirko
2020-09-02 15:23                                   ` Jakub Kicinski
2020-09-02 16:18                                     ` Parav Pandit
2020-09-02 20:10                                       ` Parav Pandit
2020-09-03  5:54                                     ` Jiri Pirko
2020-09-03 19:31                                       ` Jakub Kicinski
2020-09-04  8:43                                         ` Jiri Pirko
2020-09-06  3:08                                           ` Parav Pandit
2020-09-06 16:46                                             ` Jakub Kicinski
2020-09-07  7:21                                             ` Jiri Pirko
2020-09-07 16:18                                               ` Jakub Kicinski
2020-08-25 13:58 ` [PATCH net-next 3/3] net/mlx5: E-switch, Set controller attribute for PCI PF and VF ports Parav Pandit
2020-09-08 14:42 ` [PATCH net-next v2 0/6] devlink show controller number Parav Pandit
2020-09-08 14:42   ` [PATCH net-next v2 1/6] net/mlx5: E-switch, Read controller number from device Parav Pandit
2020-09-08 14:42   ` [PATCH net-next v2 2/6] devlink: Add comment block for missing port attributes Parav Pandit
2020-09-08 14:42   ` [PATCH net-next v2 3/6] devlink: Move structure comments outside of structure Parav Pandit
2020-09-08 14:42   ` [PATCH net-next v2 4/6] devlink: Introduce external controller flag Parav Pandit
2020-09-08 14:42   ` [PATCH net-next v2 5/6] devlink: Introduce controller number Parav Pandit
2020-09-08 18:50     ` Jakub Kicinski
2020-09-09  3:06       ` Parav Pandit
2020-09-08 14:42   ` [PATCH net-next v2 6/6] devlink: Use controller while building phys_port_name Parav Pandit
2020-09-09  4:50 ` [PATCH net-next v3 0/6] devlink show controller number Parav Pandit
2020-09-09  4:50   ` [PATCH net-next v3 1/6] net/mlx5: E-switch, Read controller number from device Parav Pandit
2020-09-09  4:50   ` [PATCH net-next v3 2/6] devlink: Add comment block for missing port attributes Parav Pandit
2020-09-09  4:50   ` [PATCH net-next v3 3/6] devlink: Move structure comments outside of structure Parav Pandit
2020-09-09  4:50   ` [PATCH net-next v3 4/6] devlink: Introduce external controller flag Parav Pandit
2020-09-09  4:50   ` [PATCH net-next v3 5/6] devlink: Introduce controller number Parav Pandit
2020-09-09  4:50   ` [PATCH net-next v3 6/6] devlink: Use controller while building phys_port_name Parav Pandit
2020-09-10 15:02     ` David Ahern
2020-09-09 15:34   ` [PATCH net-next v3 0/6] devlink show controller number Jakub Kicinski
2020-09-09 21:20     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).