All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 00/14] Add mlx5 subfunction support
@ 2020-12-09  7:29 saeed
  2020-12-09  7:29 ` [PATCH net-next v2 01/14] devlink: Prepare code to fill multiple port function attributes saeed
                   ` (14 more replies)
  0 siblings, 15 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit

From: Parav Pandit <parav@nvidia.com>

Hi Dave, Jakub, Jason,

This series form Parav was the theme of this mlx5 release cycle,
we've been waiting anxiously for the auxbus infrastructure to make it into
the kernel, and now as the auxbus is in and all the stars are aligned, I
can finally submit this V2 of the devlink and mlx5 subfunction support.

Subfunctions came to solve the scaling issue of virtualization
and switchdev environments, where SRIOV failed to deliver and users ran
out of VFs very quickly as SRIOV demands huge amount of physical resources
in both of the servers and the NIC.

Subfunction provide the same functionality as SRIOV but in a very
lightweight manner, please see the thorough and detailed
documentation from Parav below, in the commit messages and the
Networking documentation patches at the end of this series.

Sending V2 as a continuation to V1 that was sent Last month [0],
Parav has provided full change-log in the commit message of each patch.
[0] https://lore.kernel.org/linux-rdma/20201112192424.2742-1-parav@nvidia.com/

Parav Pandit Says:
=================

This patchset introduces support for mlx5 subfunction (SF).

A subfunction is a lightweight function that has a parent PCI function on
which it is deployed. mlx5 subfunction has its own function capabilities
and its own resources. This means a subfunction has its own dedicated
queues(txq, rxq, cq, eq). These queues are neither shared nor stealed from
the parent PCI function.

When subfunction is RDMA capable, it has its own QP1, GID table and rdma
resources neither shared nor stealed from the parent PCI function.

A subfunction has dedicated window in PCI BAR space that is not shared
with ther other subfunctions or parent PCI function. This ensures that all
class devices of the subfunction accesses only assigned PCI BAR space.

A Subfunction supports eswitch representation through which it supports tc
offloads. User must configure eswitch to send/receive packets from/to
subfunction port.

Subfunctions share PCI level resources such as PCI MSI-X IRQs with
their other subfunctions and/or with its parent PCI function.

Patch summary:
--------------
Patch 1 to 4 prepares devlink
patch 5 to 7 mlx5 adds SF device support
Patch 8 to 11 mlx5 adds SF devlink port support
Patch 12 and 14 adds documentation

Patch-1 prepares code to handle multiple port function attributes
Patch-2 introduces devlink pcisf port flavour similar to pcipf and pcivf
Patch-3 adds port add and delete driver callbacks
Patch-4 adds port function state get and set callbacks
Patch-5 mlx5 vhca event notifier support to distribute subfunction
        state change notification
Patch-6 adds SF auxiliary device
Patch-7 adds SF auxiliary driver
Patch-8 prepares eswitch to handler SF vport
Patch-9 adds eswitch helpers to add/remove SF vport
Patch-10 implements devlink port add/del callbacks
Patch-11 implements devlink port function get/set callbacks
Patch-12 to 14 adds documentation
Patch-12 added mlx5 port function documentation
Patch-13 adds subfunction documentation
Patch-14 adds mlx5 subfunction documentation

Subfunction support is discussed in detail in RFC [1] and [2].
RFC [1] and extension [2] describes requirements, design and proposed
plumbing using devlink, auxiliary bus and sysfs for systemd/udev
support. Functionality of this patchset is best explained using real
examples further below.

overview:
--------
A subfunction can be created and deleted by a user using devlink port
add/delete interface.

A subfunction can be configured using devlink port function attribute
before its activated.

When a subfunction is activated, it results in an auxiliary device on
the host PCI device where it is deployed. A driver binds to the
auxiliary device that further creates supported class devices.

example subfunction usage sequence:
-----------------------------------
Change device to switchdev mode:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

Add a devlink port of subfunction flaovur:
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

Configure mac address of the port function:
$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88

Now activate the function:
$ devlink port function set ens2f0npf0sf88 state active

Now use the auxiliary device and class devices:
$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ ip link show
127: ens2f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:b3:d1:12 brd ff:ff:ff:ff:ff:ff
    altname enp6s0f0np0
129: p0sf88: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:88:88 brd ff:ff:ff:ff:ff:ff

$ rdma dev show
43: rdmap6s0f0: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d112 sys_image_guid 248a:0703:00b3:d112
44: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

After use inactivate the function:
$ devlink port function set ens2f0npf0sf88 state inactive

Now delete the subfunction port:
$ devlink port del ens2f0npf0sf88

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2

=================
---
Changelog:
v1->v2:
 - added documentation for subfunction and its mlx5 implementation
 - add MLX5_SF config option documentation
 - rebased
 - dropped devlink global lock improvement patch as mlx5 doesn't support
   reload while SFs are allocated
 - dropped devlink reload lock patch as mlx5 doesn't support reload
   when SFs are allocated
 - using updated vhca event from device to add remove auxiliary device
 - split sf devlink port allocation and sf hardware context allocation

Parav Pandit (13):
  devlink: Prepare code to fill multiple port function attributes
  devlink: Introduce PCI SF port flavour and port attribute
  devlink: Support add and delete devlink port
  devlink: Support get and set state of port function
  net/mlx5: Introduce vhca state event notifier
  net/mlx5: SF, Add auxiliary device support
  net/mlx5: SF, Add auxiliary device driver
  net/mlx5: E-switch, Add eswitch helpers for SF vport
  net/mlx5: SF, Add port add delete functionality
  net/mlx5: SF, Port function state change support
  devlink: Add devlink port documentation
  devlink: Extend devlink port documentation for subfunctions
  net/mlx5: Add devlink subfunction port documentation

Vu Pham (1):
  net/mlx5: E-switch, Prepare eswitch to handle SF vport

 Documentation/driver-api/auxiliary_bus.rst    |   2 +
 .../device_drivers/ethernet/mellanox/mlx5.rst | 209 +++++++
 .../networking/devlink/devlink-port.rst       | 199 +++++++
 Documentation/networking/devlink/index.rst    |   1 +
 .../net/ethernet/mellanox/mlx5/core/Kconfig   |  19 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   9 +
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   8 +
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  19 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   5 +-
 .../mellanox/mlx5/core/esw/acl/egress_ofld.c  |   2 +-
 .../mellanox/mlx5/core/esw/devlink_port.c     |  41 ++
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  48 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  78 +++
 .../mellanox/mlx5/core/eswitch_offloads.c     |  47 +-
 .../net/ethernet/mellanox/mlx5/core/events.c  |   7 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |  60 +-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  12 +
 .../net/ethernet/mellanox/mlx5/core/pci_irq.c |  20 +
 .../net/ethernet/mellanox/mlx5/core/sf/cmd.c  |  48 ++
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.c  | 271 +++++++++
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.h  |  55 ++
 .../mellanox/mlx5/core/sf/dev/driver.c        | 101 ++++
 .../ethernet/mellanox/mlx5/core/sf/devlink.c  | 552 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/sf/hw_table.c | 235 ++++++++
 .../mlx5/core/sf/mlx5_ifc_vhca_event.h        |  82 +++
 .../net/ethernet/mellanox/mlx5/core/sf/priv.h |  21 +
 .../net/ethernet/mellanox/mlx5/core/sf/sf.h   |  92 +++
 .../mellanox/mlx5/core/sf/vhca_event.c        | 189 ++++++
 .../mellanox/mlx5/core/sf/vhca_event.h        |  57 ++
 .../net/ethernet/mellanox/mlx5/core/vport.c   |   3 +-
 include/linux/mlx5/driver.h                   |  16 +-
 include/net/devlink.h                         |  79 +++
 include/uapi/linux/devlink.h                  |  26 +
 net/core/devlink.c                            | 266 ++++++++-
 34 files changed, 2832 insertions(+), 47 deletions(-)
 create mode 100644 Documentation/networking/devlink/devlink-port.rst
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/mlx5_ifc_vhca_event.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h

-- 
2.26.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 01/14] devlink: Prepare code to fill multiple port function attributes
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 02/14] devlink: Introduce PCI SF port flavour and port attribute saeed
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Jiri Pirko, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Prepare code to fill zero or more port function optional attributes.
Subsequent patch makes use of this to fill more port function
attributes.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 net/core/devlink.c | 63 +++++++++++++++++++++++-----------------------
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 88c0ac8ed444..d812ed26a330 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -712,6 +712,31 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 	return 0;
 }
 
+static int
+devlink_port_function_hw_addr_fill(struct devlink *devlink, const struct devlink_ops *ops,
+				   struct devlink_port *port, struct sk_buff *msg,
+				   struct netlink_ext_ack *extack, bool *msg_updated)
+{
+	u8 hw_addr[MAX_ADDR_LEN];
+	int hw_addr_len;
+	int err;
+
+	if (!ops->port_function_hw_addr_get)
+		return 0;
+
+	err = ops->port_function_hw_addr_get(devlink, port, hw_addr, &hw_addr_len, extack);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+	err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, hw_addr_len, hw_addr);
+	if (err)
+		return err;
+	*msg_updated = true;
+	return 0;
+}
+
 static int
 devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
 				   struct netlink_ext_ack *extack)
@@ -719,36 +744,16 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
 	struct devlink *devlink = port->devlink;
 	const struct devlink_ops *ops;
 	struct nlattr *function_attr;
-	bool empty_nest = true;
-	int err = 0;
+	bool msg_updated = false;
+	int err;
 
 	function_attr = nla_nest_start_noflag(msg, DEVLINK_ATTR_PORT_FUNCTION);
 	if (!function_attr)
 		return -EMSGSIZE;
 
 	ops = devlink->ops;
-	if (ops->port_function_hw_addr_get) {
-		int hw_addr_len;
-		u8 hw_addr[MAX_ADDR_LEN];
-
-		err = ops->port_function_hw_addr_get(devlink, port, hw_addr, &hw_addr_len, extack);
-		if (err == -EOPNOTSUPP) {
-			/* Port function attributes are optional for a port. If port doesn't
-			 * support function attribute, returning -EOPNOTSUPP is not an error.
-			 */
-			err = 0;
-			goto out;
-		} else if (err) {
-			goto out;
-		}
-		err = nla_put(msg, DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR, hw_addr_len, hw_addr);
-		if (err)
-			goto out;
-		empty_nest = false;
-	}
-
-out:
-	if (err || empty_nest)
+	err = devlink_port_function_hw_addr_fill(devlink, ops, port, msg, extack, &msg_updated);
+	if (err || !msg_updated)
 		nla_nest_cancel(msg, function_attr);
 	else
 		nla_nest_end(msg, function_attr);
@@ -986,7 +991,6 @@ devlink_port_function_hw_addr_set(struct devlink *devlink, struct devlink_port *
 	const struct devlink_ops *ops;
 	const u8 *hw_addr;
 	int hw_addr_len;
-	int err;
 
 	hw_addr = nla_data(attr);
 	hw_addr_len = nla_len(attr);
@@ -1011,12 +1015,7 @@ devlink_port_function_hw_addr_set(struct devlink *devlink, struct devlink_port *
 		return -EOPNOTSUPP;
 	}
 
-	err = ops->port_function_hw_addr_set(devlink, port, hw_addr, hw_addr_len, extack);
-	if (err)
-		return err;
-
-	devlink_port_notify(port, DEVLINK_CMD_PORT_NEW);
-	return 0;
+	return ops->port_function_hw_addr_set(devlink, port, hw_addr, hw_addr_len, extack);
 }
 
 static int
@@ -1037,6 +1036,8 @@ devlink_port_function_set(struct devlink *devlink, struct devlink_port *port,
 	if (attr)
 		err = devlink_port_function_hw_addr_set(devlink, port, attr, extack);
 
+	if (!err)
+		devlink_port_notify(port, DEVLINK_CMD_PORT_NEW);
 	return err;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 02/14] devlink: Introduce PCI SF port flavour and port attribute
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
  2020-12-09  7:29 ` [PATCH net-next v2 01/14] devlink: Prepare code to fill multiple port function attributes saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 03/14] devlink: Support add and delete devlink port saeed
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Jiri Pirko, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

A PCI sub-function (SF) represents a portion of the device similar
to PCI VF.

In an eswitch, PCI SF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with SF,
and its representor netdevice, introduce a PCI SF port flavour.

When devlink port flavour is PCI SF, fill up PCI SF attributes of the
port.

Extend port name creation using PCI PF and SF number scheme on best
effort basis, so that vendor drivers can skip defining their own
scheme.
This is done as cApfNSfM, where A, N and M are controller, PCI PF and
PCI SF number respectively.
This is similar to existing naming for PCI PF and PCI VF ports.

An example view of a PCI SF port:

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state active opstate attached

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/net/devlink.h        | 17 +++++++++++++
 include/uapi/linux/devlink.h |  5 ++++
 net/core/devlink.c           | 46 ++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f466819cc477..5bd43f0a79a8 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -93,6 +93,20 @@ struct devlink_port_pci_vf_attrs {
 	u8 external:1;
 };
 
+/**
+ * struct devlink_port_pci_sf_attrs - devlink port's PCI SF attributes
+ * @controller: Associated controller number
+ * @pf: Associated PCI PF number for this port.
+ * @sf: Associated PCI SF for of the PCI PF for this port.
+ * @external: when set, indicates if a port is for an external controller
+ */
+struct devlink_port_pci_sf_attrs {
+	u32 controller;
+	u16 pf;
+	u32 sf;
+	u8 external:1;
+};
+
 /**
  * struct devlink_port_attrs - devlink port object
  * @flavour: flavour of the port
@@ -114,6 +128,7 @@ struct devlink_port_attrs {
 		struct devlink_port_phys_attrs phys;
 		struct devlink_port_pci_pf_attrs pci_pf;
 		struct devlink_port_pci_vf_attrs pci_vf;
+		struct devlink_port_pci_sf_attrs pci_sf;
 	};
 };
 
@@ -1404,6 +1419,8 @@ void devlink_port_attrs_pci_pf_set(struct devlink_port *devlink_port, u32 contro
 				   u16 pf, bool external);
 void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 controller,
 				   u16 pf, u16 vf, bool external);
+void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller,
+				   u16 pf, u32 sf, bool external);
 int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u32 size, u16 ingress_pools_count,
 			u16 egress_pools_count, u16 ingress_tc_count,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 5203f54a2be1..6fe00f10eb3f 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -200,6 +200,10 @@ enum devlink_port_flavour {
 	DEVLINK_PORT_FLAVOUR_UNUSED, /* Port which exists in the switch, but
 				      * is not used in any way.
 				      */
+	DEVLINK_PORT_FLAVOUR_PCI_SF, /* Represents eswitch port
+				      * for the PCI SF. It is an internal
+				      * port that faces the PCI SF.
+				      */
 };
 
 enum devlink_param_cmode {
@@ -529,6 +533,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_RELOAD_ACTION_INFO,        /* nested */
 	DEVLINK_ATTR_RELOAD_ACTION_STATS,       /* nested */
 
+	DEVLINK_ATTR_PORT_PCI_SF_NUMBER,	/* u32 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index d812ed26a330..8a33b3150d1a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -690,6 +690,15 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_vf.external))
 			return -EMSGSIZE;
 		break;
+	case DEVLINK_PORT_FLAVOUR_PCI_SF:
+		if (nla_put_u32(msg, DEVLINK_ATTR_PORT_CONTROLLER_NUMBER,
+				attrs->pci_sf.controller) ||
+		    nla_put_u16(msg, DEVLINK_ATTR_PORT_PCI_PF_NUMBER, attrs->pci_sf.pf) ||
+		    nla_put_u32(msg, DEVLINK_ATTR_PORT_PCI_SF_NUMBER, attrs->pci_sf.sf))
+			return -EMSGSIZE;
+		if (nla_put_u8(msg, DEVLINK_ATTR_PORT_EXTERNAL, attrs->pci_sf.external))
+			return -EMSGSIZE;
+		break;
 	case DEVLINK_PORT_FLAVOUR_PHYSICAL:
 	case DEVLINK_PORT_FLAVOUR_CPU:
 	case DEVLINK_PORT_FLAVOUR_DSA:
@@ -8378,6 +8387,33 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_vf_set);
 
+/**
+ *	devlink_port_attrs_pci_sf_set - Set PCI SF port attributes
+ *
+ *	@devlink_port: devlink port
+ *	@controller: associated controller number for the devlink port instance
+ *	@pf: associated PF for the devlink port instance
+ *	@sf: associated SF of a PF for the devlink port instance
+ *	@external: indicates if the port is for an external controller
+ */
+void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 controller,
+				   u16 pf, u32 sf, bool external)
+{
+	struct devlink_port_attrs *attrs = &devlink_port->attrs;
+	int ret;
+
+	if (WARN_ON(devlink_port->registered))
+		return;
+	ret = __devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_PCI_SF);
+	if (ret)
+		return;
+	attrs->pci_sf.controller = controller;
+	attrs->pci_sf.pf = pf;
+	attrs->pci_sf.sf = sf;
+	attrs->pci_sf.external = external;
+}
+EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set);
+
 static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 					     char *name, size_t len)
 {
@@ -8426,6 +8462,16 @@ static int __devlink_port_phys_port_name_get(struct devlink_port *devlink_port,
 		n = snprintf(name, len, "pf%uvf%u",
 			     attrs->pci_vf.pf, attrs->pci_vf.vf);
 		break;
+	case DEVLINK_PORT_FLAVOUR_PCI_SF:
+		if (attrs->pci_sf.external) {
+			n = snprintf(name, len, "c%u", attrs->pci_sf.controller);
+			if (n >= len)
+				return -EINVAL;
+			len -= n;
+			name += n;
+		}
+		n = snprintf(name, len, "pf%usf%u", attrs->pci_sf.pf, attrs->pci_sf.sf);
+		break;
 	}
 
 	if (n >= len)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 03/14] devlink: Support add and delete devlink port
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
  2020-12-09  7:29 ` [PATCH net-next v2 01/14] devlink: Prepare code to fill multiple port function attributes saeed
  2020-12-09  7:29 ` [PATCH net-next v2 02/14] devlink: Introduce PCI SF port flavour and port attribute saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 04/14] devlink: Support get and set state of port function saeed
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Jiri Pirko, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Extended devlink interface for the user to add and delete port.
Extend devlink to connect user requests to driver to add/delete
such port in the device.

When driver routines are invoked, devlink instance lock is not held.
This enables driver to perform several devlink objects registration,
unregistration such as (port, health reporter, resource etc)
by using exising devlink APIs.
This also helps to uniformly use the code for port unregistration
during driver unload and during port deletion initiated by user.

Examples of add, show and delete commands:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev eth0 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ udevadm test-builtin net_id /sys/class/net/eth0
Load module index
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v245'.
ID_NET_NAMING_SCHEME=v245
ID_NET_NAME_PATH=enp6s0f0npf0sf88
ID_NET_NAME_SLOT=ens2f0npf0sf88
Unload module index
Unloaded link configuration context.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/net/devlink.h | 39 ++++++++++++++++++++++++
 net/core/devlink.c    | 71 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 110 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 5bd43f0a79a8..f8cff3e402da 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -153,6 +153,17 @@ struct devlink_port {
 	struct mutex reporters_lock; /* Protects reporter_list */
 };
 
+struct devlink_port_new_attrs {
+	enum devlink_port_flavour flavour;
+	unsigned int port_index;
+	u32 controller;
+	u32 sfnum;
+	u16 pfnum;
+	u8 port_index_valid:1,
+	   controller_valid:1,
+	   sfnum_valid:1;
+};
+
 struct devlink_sb_pool_info {
 	enum devlink_sb_pool_type pool_type;
 	u32 size;
@@ -1363,6 +1374,34 @@ struct devlink_ops {
 	int (*port_function_hw_addr_set)(struct devlink *devlink, struct devlink_port *port,
 					 const u8 *hw_addr, int hw_addr_len,
 					 struct netlink_ext_ack *extack);
+	/**
+	 * @port_new: Port add function.
+	 *
+	 * Should be used by device driver to let caller add new port of a
+	 * specified flavour with optional attributes.
+	 * Driver should return -EOPNOTSUPP if it doesn't support port addition
+	 * of a specified flavour or specified attributes. Driver should set
+	 * extack error message in case of fail to add the port. Devlink core
+	 * does not hold a devlink instance lock when this callback is invoked.
+	 * Driver must ensures synchronization when adding or deleting a port.
+	 * Driver must register a port with devlink core.
+	 */
+	int (*port_new)(struct devlink *devlink,
+			const struct devlink_port_new_attrs *attrs,
+			struct netlink_ext_ack *extack);
+	/**
+	 * @port_del: Port delete function.
+	 *
+	 * Should be used by device driver to let caller delete port which was
+	 * previously created using port_new() callback.
+	 * Driver should return -EOPNOTSUPP if it doesn't support port deletion.
+	 * Driver should set extack error message in case of fail to delete the
+	 * port. Devlink core does not hold a devlink instance lock when this
+	 * callback is invoked. Driver must ensures synchronization when adding
+	 * or deleting a port. Driver must register a port with devlink core.
+	 */
+	int (*port_del)(struct devlink *devlink, unsigned int port_index,
+			struct netlink_ext_ack *extack);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 8a33b3150d1a..43198d9c6d25 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1146,6 +1146,61 @@ static int devlink_nl_cmd_port_unsplit_doit(struct sk_buff *skb,
 	return devlink_port_unsplit(devlink, port_index, info->extack);
 }
 
+static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	struct devlink_port_new_attrs new_attrs = {};
+	struct devlink *devlink = info->user_ptr[0];
+
+	if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] ||
+	    !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) {
+		NL_SET_ERR_MSG_MOD(extack, "Port flavour or PCI PF are not specified");
+		return -EINVAL;
+	}
+	new_attrs.flavour = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_FLAVOUR]);
+	new_attrs.pfnum =
+		nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]);
+
+	if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) {
+		new_attrs.port_index =
+			nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]);
+		new_attrs.port_index_valid = true;
+	}
+	if (info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]) {
+		new_attrs.controller =
+			nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]);
+		new_attrs.controller_valid = true;
+	}
+	if (info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]) {
+		new_attrs.sfnum = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]);
+		new_attrs.sfnum_valid = true;
+	}
+
+	if (!devlink->ops->port_new)
+		return -EOPNOTSUPP;
+
+	return devlink->ops->port_new(devlink, &new_attrs, extack);
+}
+
+static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	struct devlink *devlink = info->user_ptr[0];
+	unsigned int port_index;
+
+	if (!info->attrs[DEVLINK_ATTR_PORT_INDEX]) {
+		NL_SET_ERR_MSG_MOD(extack, "Port index is not specified");
+		return -EINVAL;
+	}
+	port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]);
+
+	if (!devlink->ops->port_del)
+		return -EOPNOTSUPP;
+	return devlink->ops->port_del(devlink, port_index, extack);
+}
+
 static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink,
 			      struct devlink_sb *devlink_sb,
 			      enum devlink_command cmd, u32 portid,
@@ -7609,6 +7664,10 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_RELOAD_ACTION] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_RELOAD_ACTION_DRIVER_REINIT,
 							DEVLINK_RELOAD_ACTION_MAX),
 	[DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK),
+	[DEVLINK_ATTR_PORT_FLAVOUR] = { .type = NLA_U16 },
+	[DEVLINK_ATTR_PORT_PCI_PF_NUMBER] = { .type = NLA_U16 },
+	[DEVLINK_ATTR_PORT_PCI_SF_NUMBER] = { .type = NLA_U32 },
+	[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER] = { .type = NLA_U32 },
 };
 
 static const struct genl_small_ops devlink_nl_ops[] = {
@@ -7648,6 +7707,18 @@ static const struct genl_small_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NO_LOCK,
 	},
+	{
+		.cmd = DEVLINK_CMD_PORT_NEW,
+		.doit = devlink_nl_cmd_port_new_doit,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NO_LOCK,
+	},
+	{
+		.cmd = DEVLINK_CMD_PORT_DEL,
+		.doit = devlink_nl_cmd_port_del_doit,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NO_LOCK,
+	},
 	{
 		.cmd = DEVLINK_CMD_SB_GET,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 04/14] devlink: Support get and set state of port function
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (2 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 03/14] devlink: Support add and delete devlink port saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 05/14] net/mlx5: Introduce vhca state event notifier saeed
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Jiri Pirko, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

devlink port function can be in active or inactive state.
Allow users to get and set port function's state.

When the port function it activated, its operational state may change
after a while when the device is created and driver binds to it.
Similarly on deactivation flow.

To clearly describe the state of the port function and its device's
operational state in the host system, define state and opstate
attributes.

Example of a PCI SF port which supports a port function:
Create a device with ID=10 and one physical port.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show pci/0000:06:00.0/32768 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/net/devlink.h        | 23 +++++++++
 include/uapi/linux/devlink.h | 21 +++++++++
 net/core/devlink.c           | 90 +++++++++++++++++++++++++++++++++++-
 3 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index f8cff3e402da..18a7e66b7982 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1374,6 +1374,29 @@ struct devlink_ops {
 	int (*port_function_hw_addr_set)(struct devlink *devlink, struct devlink_port *port,
 					 const u8 *hw_addr, int hw_addr_len,
 					 struct netlink_ext_ack *extack);
+	/**
+	 * @port_function_state_get: Port function's state get function.
+	 *
+	 * Should be used by device drivers to report the state of a function
+	 * managed by the devlink port. Driver should return -EOPNOTSUPP if it
+	 * doesn't support port function handling for a particular port.
+	 */
+	int (*port_function_state_get)(struct devlink *devlink,
+				       struct devlink_port *port,
+				       enum devlink_port_function_state *state,
+				       enum devlink_port_function_opstate *opstate,
+				       struct netlink_ext_ack *extack);
+	/**
+	 * @port_function_state_set: Port function's state set function.
+	 *
+	 * Should be used by device drivers to set the state of a function
+	 * managed by the devlink port. Driver should return -EOPNOTSUPP if it
+	 * doesn't support port function handling for a particular port.
+	 */
+	int (*port_function_state_set)(struct devlink *devlink,
+				       struct devlink_port *port,
+				       enum devlink_port_function_state state,
+				       struct netlink_ext_ack *extack);
 	/**
 	 * @port_new: Port add function.
 	 *
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 6fe00f10eb3f..beeb30bb6b20 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -583,9 +583,30 @@ enum devlink_resource_unit {
 enum devlink_port_function_attr {
 	DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
 	DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR,	/* binary */
+	DEVLINK_PORT_FUNCTION_ATTR_STATE,	/* u8 */
+	DEVLINK_PORT_FUNCTION_ATTR_OPSTATE,	/* u8 */
 
 	__DEVLINK_PORT_FUNCTION_ATTR_MAX,
 	DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
 };
 
+enum devlink_port_function_state {
+	DEVLINK_PORT_FUNCTION_STATE_INACTIVE,
+	DEVLINK_PORT_FUNCTION_STATE_ACTIVE,
+};
+
+/**
+ * enum devlink_port_function_opstate - indicates operational state of port function
+ * @DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED: Driver is attached to the function of port, for
+ *					    gracefufl tear down of the function, after
+ *					    inactivation of the port function, user should wait
+ *					    for operational state to turn DETACHED.
+ * @DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED: Driver is detached from the function of port; it is
+ *					    safe to delete the port.
+ */
+enum devlink_port_function_opstate {
+	DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED,
+	DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED,
+};
+
 #endif /* _UAPI_LINUX_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 43198d9c6d25..c7e3d3cdd5fa 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -87,6 +87,9 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
 
 static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1] = {
 	[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] = { .type = NLA_BINARY },
+	[DEVLINK_PORT_FUNCTION_ATTR_STATE] =
+		NLA_POLICY_RANGE(NLA_U8, DEVLINK_PORT_FUNCTION_STATE_INACTIVE,
+				 DEVLINK_PORT_FUNCTION_STATE_ACTIVE),
 };
 
 static LIST_HEAD(devlink_list);
@@ -746,6 +749,57 @@ devlink_port_function_hw_addr_fill(struct devlink *devlink, const struct devlink
 	return 0;
 }
 
+static bool
+devlink_port_function_state_valid(enum devlink_port_function_state state)
+{
+	return state == DEVLINK_PORT_FUNCTION_STATE_INACTIVE ||
+	       state == DEVLINK_PORT_FUNCTION_STATE_ACTIVE;
+}
+
+static bool
+devlink_port_function_opstate_valid(enum devlink_port_function_opstate state)
+{
+	return state == DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED ||
+	       state == DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED;
+}
+
+static int
+devlink_port_function_state_fill(struct devlink *devlink,
+				 const struct devlink_ops *ops,
+				 struct devlink_port *port, struct sk_buff *msg,
+				 struct netlink_ext_ack *extack,
+				 bool *msg_updated)
+{
+	enum devlink_port_function_opstate opstate;
+	enum devlink_port_function_state state;
+	int err;
+
+	if (!ops->port_function_state_get)
+		return 0;
+
+	err = ops->port_function_state_get(devlink, port, &state, &opstate, extack);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+	if (!devlink_port_function_state_valid(state)) {
+		WARN_ON_ONCE(1);
+		NL_SET_ERR_MSG_MOD(extack, "Invalid state value read from driver");
+		return -EINVAL;
+	}
+	if (!devlink_port_function_opstate_valid(opstate)) {
+		WARN_ON_ONCE(1);
+		NL_SET_ERR_MSG_MOD(extack, "Invalid operational state value read from driver");
+		return -EINVAL;
+	}
+	if (nla_put_u8(msg, DEVLINK_PORT_FUNCTION_ATTR_STATE, state) ||
+	    nla_put_u8(msg, DEVLINK_PORT_FUNCTION_ATTR_OPSTATE, opstate))
+		return -EMSGSIZE;
+	*msg_updated = true;
+	return 0;
+}
+
 static int
 devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
 				   struct netlink_ext_ack *extack)
@@ -762,6 +816,13 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
 
 	ops = devlink->ops;
 	err = devlink_port_function_hw_addr_fill(devlink, ops, port, msg, extack, &msg_updated);
+	if (err)
+		goto out;
+	err = devlink_port_function_state_fill(devlink, ops, port, msg, extack,
+					       &msg_updated);
+	if (err)
+		goto out;
+out:
 	if (err || !msg_updated)
 		nla_nest_cancel(msg, function_attr);
 	else
@@ -1027,6 +1088,22 @@ devlink_port_function_hw_addr_set(struct devlink *devlink, struct devlink_port *
 	return ops->port_function_hw_addr_set(devlink, port, hw_addr, hw_addr_len, extack);
 }
 
+static int
+devlink_port_function_state_set(struct devlink *devlink, struct devlink_port *port,
+				const struct nlattr *attr, struct netlink_ext_ack *extack)
+{
+	enum devlink_port_function_state state;
+	const struct devlink_ops *ops;
+
+	state = nla_get_u8(attr);
+	ops = devlink->ops;
+	if (!ops->port_function_state_set) {
+		NL_SET_ERR_MSG_MOD(extack, "Port function does not support state setting");
+		return -EOPNOTSUPP;
+	}
+	return ops->port_function_state_set(devlink, port, state, extack);
+}
+
 static int
 devlink_port_function_set(struct devlink *devlink, struct devlink_port *port,
 			  const struct nlattr *attr, struct netlink_ext_ack *extack)
@@ -1042,8 +1119,19 @@ devlink_port_function_set(struct devlink *devlink, struct devlink_port *port,
 	}
 
 	attr = tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR];
-	if (attr)
+	if (attr) {
 		err = devlink_port_function_hw_addr_set(devlink, port, attr, extack);
+		if (err)
+			return err;
+	}
+	/* Keep this as the last function attribute set, so that when
+	 * multiple port function attributes are set along with state,
+	 * Those can be applied first before activating the state.
+	 */
+	attr = tb[DEVLINK_PORT_FUNCTION_ATTR_STATE];
+	if (attr)
+		err = devlink_port_function_state_set(devlink, port, attr,
+						      extack);
 
 	if (!err)
 		devlink_port_notify(port, DEVLINK_CMD_PORT_NEW);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 05/14] net/mlx5: Introduce vhca state event notifier
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (3 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 04/14] devlink: Support get and set state of port function saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 06/14] net/mlx5: SF, Add auxiliary device support saeed
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.

Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.

This enables single entity to subscribe, query and rearm the event
for a function.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v1->v2:
 - new patch
---
 .../net/ethernet/mellanox/mlx5/core/Kconfig   |   9 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   4 +
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   4 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   3 +
 .../net/ethernet/mellanox/mlx5/core/events.c  |   7 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |  16 ++
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   2 +
 .../mlx5/core/sf/mlx5_ifc_vhca_event.h        |  82 ++++++++
 .../net/ethernet/mellanox/mlx5/core/sf/sf.h   |  45 +++++
 .../mellanox/mlx5/core/sf/vhca_event.c        | 189 ++++++++++++++++++
 .../mellanox/mlx5/core/sf/vhca_event.h        |  57 ++++++
 include/linux/mlx5/driver.h                   |   4 +
 12 files changed, 422 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/mlx5_ifc_vhca_event.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 485478979b1a..10dfaf671c90 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -202,3 +202,12 @@ config MLX5_SW_STEERING
 	default y
 	help
 	Build support for software-managed steering in the NIC.
+
+config MLX5_SF
+	bool "Mellanox Technologies subfunction device support using auxiliary device"
+	depends on MLX5_CORE && MLX5_CORE_EN
+	default n
+	help
+	Build support for subfuction device in the NIC. A Mellanox subfunction
+	device can support RDMA, netdevice and vdpa device.
+	It is similar to a SRIOV VF but it doesn't require SRIOV support.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 83a67ca43a41..3abb286dbbea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -85,3 +85,7 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o steering/dr_table.o
 					steering/dr_ste.o steering/dr_send.o \
 					steering/dr_cmd.o steering/dr_fw.o \
 					steering/dr_action.o steering/fs_dr.o
+#
+# SF device
+#
+mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 50c7b9ee80c3..47dcc3ac2cf0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -464,6 +464,8 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
 	case MLX5_CMD_OP_ALLOC_MEMIC:
 	case MLX5_CMD_OP_MODIFY_XRQ:
 	case MLX5_CMD_OP_RELEASE_XRQ_ERROR:
+	case MLX5_CMD_OP_QUERY_VHCA_STATE:
+	case MLX5_CMD_OP_MODIFY_VHCA_STATE:
 		*status = MLX5_DRIVER_STATUS_ABORTED;
 		*synd = MLX5_DRIVER_SYND;
 		return -EIO;
@@ -657,6 +659,8 @@ const char *mlx5_command_str(int command)
 	MLX5_COMMAND_STR_CASE(DESTROY_UMEM);
 	MLX5_COMMAND_STR_CASE(RELEASE_XRQ_ERROR);
 	MLX5_COMMAND_STR_CASE(MODIFY_XRQ);
+	MLX5_COMMAND_STR_CASE(QUERY_VHCA_STATE);
+	MLX5_COMMAND_STR_CASE(MODIFY_VHCA_STATE);
 	default: return "unknown command opcode";
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 4ea5d6ddf56a..f3e1d5b785ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -597,6 +597,9 @@ static void gather_async_events_mask(struct mlx5_core_dev *dev, u64 mask[4])
 		async_event_mask |=
 			(1ull << MLX5_EVENT_TYPE_ESW_FUNCTIONS_CHANGED);
 
+	if (MLX5_CAP_GEN_MAX(dev, vhca_state))
+		async_event_mask |= (1ull << MLX5_EVENT_TYPE_VHCA_STATE_CHANGE);
+
 	mask[0] = async_event_mask;
 
 	if (MLX5_CAP_GEN(dev, event_cap))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index 3ce17c3d7a00..5523d218e5fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -110,6 +110,8 @@ static const char *eqe_type_str(u8 type)
 		return "MLX5_EVENT_TYPE_CMD";
 	case MLX5_EVENT_TYPE_ESW_FUNCTIONS_CHANGED:
 		return "MLX5_EVENT_TYPE_ESW_FUNCTIONS_CHANGED";
+	case MLX5_EVENT_TYPE_VHCA_STATE_CHANGE:
+		return "MLX5_EVENT_TYPE_VHCA_STATE_CHANGE";
 	case MLX5_EVENT_TYPE_PAGE_REQUEST:
 		return "MLX5_EVENT_TYPE_PAGE_REQUEST";
 	case MLX5_EVENT_TYPE_PAGE_FAULT:
@@ -403,3 +405,8 @@ int mlx5_notifier_call_chain(struct mlx5_events *events, unsigned int event, voi
 {
 	return atomic_notifier_call_chain(&events->nh, event, data);
 }
+
+void mlx5_events_work_enqueue(struct mlx5_core_dev *dev, struct work_struct *work)
+{
+	queue_work(dev->priv.events->wq, work);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index c08315b51fd3..6e67ad11c713 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -73,6 +73,7 @@
 #include "ecpf.h"
 #include "lib/hv_vhca.h"
 #include "diag/rsc_dump.h"
+#include "sf/vhca_event.h"
 
 MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
 MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
@@ -567,6 +568,8 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx)
 	if (MLX5_CAP_GEN_MAX(dev, mkey_by_name))
 		MLX5_SET(cmd_hca_cap, set_hca_cap, mkey_by_name, 1);
 
+	mlx5_vhca_state_cap_handle(dev, set_hca_cap);
+
 	return set_caps(dev, set_ctx, MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
 }
 
@@ -884,6 +887,12 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 		goto err_eswitch_cleanup;
 	}
 
+	err = mlx5_vhca_event_init(dev);
+	if (err) {
+		mlx5_core_err(dev, "Failed to init vhca event notifier %d\n", err);
+		goto err_fpga_cleanup;
+	}
+
 	dev->dm = mlx5_dm_create(dev);
 	if (IS_ERR(dev->dm))
 		mlx5_core_warn(dev, "Failed to init device memory%d\n", err);
@@ -894,6 +903,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 
 	return 0;
 
+err_fpga_cleanup:
+	mlx5_fpga_cleanup(dev);
 err_eswitch_cleanup:
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
 err_sriov_cleanup:
@@ -925,6 +936,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_hv_vhca_destroy(dev->hv_vhca);
 	mlx5_fw_tracer_destroy(dev->tracer);
 	mlx5_dm_cleanup(dev);
+	mlx5_vhca_event_cleanup(dev);
 	mlx5_fpga_cleanup(dev);
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
 	mlx5_sriov_cleanup(dev);
@@ -1129,6 +1141,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 		goto err_sriov;
 	}
 
+	mlx5_vhca_event_start(dev);
+
 	err = mlx5_ec_init(dev);
 	if (err) {
 		mlx5_core_err(dev, "Failed to init embedded CPU\n");
@@ -1146,6 +1160,7 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 err_sriov:
 	mlx5_ec_cleanup(dev);
 err_ec:
+	mlx5_vhca_event_stop(dev);
 	mlx5_cleanup_fs(dev);
 err_fs:
 	mlx5_accel_tls_cleanup(dev);
@@ -1173,6 +1188,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
 {
 	mlx5_sriov_detach(dev);
 	mlx5_ec_cleanup(dev);
+	mlx5_vhca_event_stop(dev);
 	mlx5_cleanup_fs(dev);
 	mlx5_accel_ipsec_cleanup(dev);
 	mlx5_accel_tls_cleanup(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 0a0302ce7144..a33b7496d748 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -259,4 +259,6 @@ void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state);
 
 void mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup);
 int mlx5_load_one(struct mlx5_core_dev *dev, bool boot);
+
+void mlx5_events_work_enqueue(struct mlx5_core_dev *dev, struct work_struct *work);
 #endif /* __MLX5_CORE_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/mlx5_ifc_vhca_event.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/mlx5_ifc_vhca_event.h
new file mode 100644
index 000000000000..1daf5a122ba3
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/mlx5_ifc_vhca_event.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#ifndef __MLX5_IFC_VHCA_EVENT_H__
+#define __MLX5_IFC_VHCA_EVENT_H__
+
+enum mlx5_ifc_vhca_state {
+	MLX5_VHCA_STATE_INVALID = 0x0,
+	MLX5_VHCA_STATE_ALLOCATED = 0x1,
+	MLX5_VHCA_STATE_ACTIVE = 0x2,
+	MLX5_VHCA_STATE_IN_USE = 0x3,
+	MLX5_VHCA_STATE_TEARDOWN_REQUEST = 0x4,
+};
+
+struct mlx5_ifc_vhca_state_context_bits {
+	u8         arm_change_event[0x1];
+	u8         reserved_at_1[0xb];
+	u8         vhca_state[0x4];
+	u8         reserved_at_10[0x10];
+
+	u8         sw_function_id[0x20];
+
+	u8         reserved_at_40[0x80];
+};
+
+struct mlx5_ifc_query_vhca_state_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x40];
+
+	struct mlx5_ifc_vhca_state_context_bits vhca_state_context;
+};
+
+struct mlx5_ifc_query_vhca_state_in_bits {
+	u8         opcode[0x10];
+	u8         uid[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         embedded_cpu_function[0x1];
+	u8         reserved_at_41[0xf];
+	u8         function_id[0x10];
+
+	u8         reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_vhca_state_field_select_bits {
+	u8         reserved_at_0[0x1e];
+	u8         sw_function_id[0x1];
+	u8         arm_change_event[0x1];
+};
+
+struct mlx5_ifc_modify_vhca_state_out_bits {
+	u8         status[0x8];
+	u8         reserved_at_8[0x18];
+
+	u8         syndrome[0x20];
+
+	u8         reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_modify_vhca_state_in_bits {
+	u8         opcode[0x10];
+	u8         uid[0x10];
+
+	u8         reserved_at_20[0x10];
+	u8         op_mod[0x10];
+
+	u8         embedded_cpu_function[0x1];
+	u8         reserved_at_41[0xf];
+	u8         function_id[0x10];
+
+	struct mlx5_ifc_vhca_state_field_select_bits vhca_state_field_select;
+
+	struct mlx5_ifc_vhca_state_context_bits vhca_state_context;
+};
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
new file mode 100644
index 000000000000..623191679b49
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#ifndef __MLX5_SF_H__
+#define __MLX5_SF_H__
+
+#include <linux/mlx5/driver.h>
+
+static inline u16 mlx5_sf_start_function_id(const struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, sf_base_id);
+}
+
+#ifdef CONFIG_MLX5_SF
+
+static inline bool mlx5_sf_supported(const struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, sf);
+}
+
+static inline u16 mlx5_sf_max_functions(const struct mlx5_core_dev *dev)
+{
+	if (!mlx5_sf_supported(dev))
+		return 0;
+	if (MLX5_CAP_GEN(dev, max_num_sf))
+		return MLX5_CAP_GEN(dev, max_num_sf);
+	else
+		return 1 << MLX5_CAP_GEN(dev, log_max_sf);
+}
+
+#else
+
+static inline bool mlx5_sf_supported(const struct mlx5_core_dev *dev)
+{
+	return false;
+}
+
+static inline u16 mlx5_sf_max_functions(const struct mlx5_core_dev *dev)
+{
+	return 0;
+}
+
+#endif
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c
new file mode 100644
index 000000000000..af2f2dd9db25
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#include <linux/mlx5/driver.h>
+#include "mlx5_ifc_vhca_event.h"
+#include "mlx5_core.h"
+#include "vhca_event.h"
+#include "ecpf.h"
+
+struct mlx5_vhca_state_notifier {
+	struct mlx5_core_dev *dev;
+	struct mlx5_nb nb;
+	struct blocking_notifier_head n_head;
+};
+
+struct mlx5_vhca_event_work {
+	struct work_struct work;
+	struct mlx5_vhca_state_notifier *notifier;
+	struct mlx5_vhca_state_event event;
+};
+
+int mlx5_cmd_query_vhca_state(struct mlx5_core_dev *dev, u16 function_id,
+			      bool ecpu, u32 *out, u32 outlen)
+{
+	u32 in[MLX5_ST_SZ_DW(query_vhca_state_in)] = {};
+
+	MLX5_SET(query_vhca_state_in, in, opcode, MLX5_CMD_OP_QUERY_VHCA_STATE);
+	MLX5_SET(query_vhca_state_in, in, function_id, function_id);
+	MLX5_SET(query_vhca_state_in, in, embedded_cpu_function, ecpu);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, outlen);
+}
+
+static int mlx5_cmd_modify_vhca_state(struct mlx5_core_dev *dev, u16 function_id,
+				      bool ecpu, u32 *in, u32 inlen)
+{
+	u32 out[MLX5_ST_SZ_DW(modify_vhca_state_out)] = {};
+
+	MLX5_SET(modify_vhca_state_in, in, opcode, MLX5_CMD_OP_MODIFY_VHCA_STATE);
+	MLX5_SET(modify_vhca_state_in, in, function_id, function_id);
+	MLX5_SET(modify_vhca_state_in, in, embedded_cpu_function, ecpu);
+
+	return mlx5_cmd_exec(dev, in, inlen, out, sizeof(out));
+}
+
+int mlx5_modify_vhca_sw_id(struct mlx5_core_dev *dev, u16 function_id, bool ecpu, u32 sw_fn_id)
+{
+	u32 out[MLX5_ST_SZ_DW(modify_vhca_state_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(modify_vhca_state_in)] = {};
+
+	MLX5_SET(modify_vhca_state_in, in, opcode, MLX5_CMD_OP_MODIFY_VHCA_STATE);
+	MLX5_SET(modify_vhca_state_in, in, function_id, function_id);
+	MLX5_SET(modify_vhca_state_in, in, embedded_cpu_function, ecpu);
+	MLX5_SET(modify_vhca_state_in, in, vhca_state_field_select.sw_function_id, 1);
+	MLX5_SET(modify_vhca_state_in, in, vhca_state_context.sw_function_id, sw_fn_id);
+
+	return mlx5_cmd_exec_inout(dev, modify_vhca_state, in, out);
+}
+
+int mlx5_vhca_event_arm(struct mlx5_core_dev *dev, u16 function_id, bool ecpu)
+{
+	u32 in[MLX5_ST_SZ_DW(modify_vhca_state_in)] = {};
+
+	MLX5_SET(modify_vhca_state_in, in, vhca_state_context.arm_change_event, 1);
+	MLX5_SET(modify_vhca_state_in, in, vhca_state_field_select.arm_change_event, 1);
+
+	return mlx5_cmd_modify_vhca_state(dev, function_id, ecpu, in, sizeof(in));
+}
+
+static void
+mlx5_vhca_event_notify(struct mlx5_core_dev *dev, struct mlx5_vhca_state_event *event)
+{
+	u32 out[MLX5_ST_SZ_DW(query_vhca_state_out)] = {};
+	int err;
+
+	err = mlx5_cmd_query_vhca_state(dev, event->function_id, event->ecpu, out, sizeof(out));
+	if (err)
+		return;
+
+	event->sw_function_id = MLX5_GET(query_vhca_state_out, out,
+					 vhca_state_context.sw_function_id);
+	event->new_vhca_state = MLX5_GET(query_vhca_state_out, out,
+					 vhca_state_context.vhca_state);
+
+	mlx5_vhca_event_arm(dev, event->function_id, event->ecpu);
+
+	blocking_notifier_call_chain(&dev->priv.vhca_state_notifier->n_head, 0, event);
+}
+
+static void mlx5_vhca_state_work_handler(struct work_struct *_work)
+{
+	struct mlx5_vhca_event_work *work = container_of(_work, struct mlx5_vhca_event_work, work);
+	struct mlx5_vhca_state_notifier *notifier = work->notifier;
+	struct mlx5_core_dev *dev = notifier->dev;
+
+	mlx5_vhca_event_notify(dev, &work->event);
+}
+
+static int
+mlx5_vhca_state_change_notifier(struct notifier_block *nb, unsigned long type, void *data)
+{
+	struct mlx5_vhca_state_notifier *notifier =
+				mlx5_nb_cof(nb, struct mlx5_vhca_state_notifier, nb);
+	struct mlx5_vhca_event_work *work;
+	struct mlx5_eqe *eqe = data;
+
+	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	if (!work)
+		return NOTIFY_DONE;
+	INIT_WORK(&work->work, &mlx5_vhca_state_work_handler);
+	work->notifier = notifier;
+	work->event.function_id = be16_to_cpu(eqe->data.vhca_state.function_id);
+	work->event.ecpu = be16_to_cpu(eqe->data.vhca_state.ec_function);
+	mlx5_events_work_enqueue(notifier->dev, &work->work);
+	return NOTIFY_OK;
+}
+
+void mlx5_vhca_state_cap_handle(struct mlx5_core_dev *dev, void *set_hca_cap)
+{
+	if (!mlx5_vhca_event_supported(dev))
+		return;
+
+	MLX5_SET(cmd_hca_cap, set_hca_cap, vhca_state, 1);
+	MLX5_SET(cmd_hca_cap, set_hca_cap, event_on_vhca_state_allocated, 1);
+	MLX5_SET(cmd_hca_cap, set_hca_cap, event_on_vhca_state_active, 1);
+	MLX5_SET(cmd_hca_cap, set_hca_cap, event_on_vhca_state_in_use, 1);
+	MLX5_SET(cmd_hca_cap, set_hca_cap, event_on_vhca_state_teardown_request, 1);
+}
+
+int mlx5_vhca_event_init(struct mlx5_core_dev *dev)
+{
+	struct mlx5_vhca_state_notifier *notifier;
+
+	if (!mlx5_vhca_event_supported(dev))
+		return 0;
+
+	notifier = kzalloc(sizeof(*notifier), GFP_KERNEL);
+	if (!notifier)
+		return -ENOMEM;
+
+	dev->priv.vhca_state_notifier = notifier;
+	notifier->dev = dev;
+	BLOCKING_INIT_NOTIFIER_HEAD(&notifier->n_head);
+	MLX5_NB_INIT(&notifier->nb, mlx5_vhca_state_change_notifier, VHCA_STATE_CHANGE);
+	return 0;
+}
+
+void mlx5_vhca_event_cleanup(struct mlx5_core_dev *dev)
+{
+	if (!mlx5_vhca_event_supported(dev))
+		return;
+
+	kfree(dev->priv.vhca_state_notifier);
+	dev->priv.vhca_state_notifier = NULL;
+}
+
+void mlx5_vhca_event_start(struct mlx5_core_dev *dev)
+{
+	struct mlx5_vhca_state_notifier *notifier;
+
+	if (!dev->priv.vhca_state_notifier)
+		return;
+
+	notifier = dev->priv.vhca_state_notifier;
+	mlx5_eq_notifier_register(dev, &notifier->nb);
+}
+
+void mlx5_vhca_event_stop(struct mlx5_core_dev *dev)
+{
+	struct mlx5_vhca_state_notifier *notifier;
+
+	if (!dev->priv.vhca_state_notifier)
+		return;
+
+	notifier = dev->priv.vhca_state_notifier;
+	mlx5_eq_notifier_unregister(dev, &notifier->nb);
+}
+
+int mlx5_vhca_event_notifier_register(struct mlx5_core_dev *dev, struct notifier_block *nb)
+{
+	if (!dev->priv.vhca_state_notifier)
+		return -EOPNOTSUPP;
+	return blocking_notifier_chain_register(&dev->priv.vhca_state_notifier->n_head, nb);
+}
+
+void mlx5_vhca_event_notifier_unregister(struct mlx5_core_dev *dev, struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&dev->priv.vhca_state_notifier->n_head, nb);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h
new file mode 100644
index 000000000000..1fe1ec6f4d4b
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#ifndef __MLX5_VHCA_EVENT_H__
+#define __MLX5_VHCA_EVENT_H__
+
+#ifdef CONFIG_MLX5_SF
+
+struct mlx5_vhca_state_event {
+	u16 function_id;
+	u16 sw_function_id;
+	u8 new_vhca_state;
+	bool ecpu;
+};
+
+static inline bool mlx5_vhca_event_supported(const struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN_MAX(dev, vhca_state);
+}
+
+void mlx5_vhca_state_cap_handle(struct mlx5_core_dev *dev, void *set_hca_cap);
+int mlx5_vhca_event_init(struct mlx5_core_dev *dev);
+void mlx5_vhca_event_cleanup(struct mlx5_core_dev *dev);
+void mlx5_vhca_event_start(struct mlx5_core_dev *dev);
+void mlx5_vhca_event_stop(struct mlx5_core_dev *dev);
+int mlx5_vhca_event_notifier_register(struct mlx5_core_dev *dev, struct notifier_block *nb);
+void mlx5_vhca_event_notifier_unregister(struct mlx5_core_dev *dev, struct notifier_block *nb);
+int mlx5_modify_vhca_sw_id(struct mlx5_core_dev *dev, u16 function_id, bool ecpu, u32 sw_fn_id);
+int mlx5_vhca_event_arm(struct mlx5_core_dev *dev, u16 function_id, bool ecpu);
+int mlx5_cmd_query_vhca_state(struct mlx5_core_dev *dev, u16 function_id,
+			      bool ecpu, u32 *out, u32 outlen);
+#else
+
+static inline void mlx5_vhca_state_cap_handle(struct mlx5_core_dev *dev, void *set_hca_cap)
+{
+}
+
+static inline int mlx5_vhca_event_init(struct mlx5_core_dev *dev)
+{
+	return 0;
+}
+
+static inline void mlx5_vhca_event_cleanup(struct mlx5_core_dev *dev)
+{
+}
+
+static inline void mlx5_vhca_event_start(struct mlx5_core_dev *dev)
+{
+}
+
+static inline void mlx5_vhca_event_stop(struct mlx5_core_dev *dev)
+{
+}
+
+#endif
+
+#endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index f93bfe7473aa..ffba0786051e 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -507,6 +507,7 @@ struct mlx5_devcom;
 struct mlx5_fw_reset;
 struct mlx5_eq_table;
 struct mlx5_irq_table;
+struct mlx5_vhca_state_notifier;
 
 struct mlx5_rate_limit {
 	u32			rate;
@@ -603,6 +604,9 @@ struct mlx5_priv {
 
 	struct mlx5_bfreg_data		bfregs;
 	struct mlx5_uars_page	       *uar;
+#ifdef CONFIG_MLX5_SF
+	struct mlx5_vhca_state_notifier *vhca_state_notifier;
+#endif
 };
 
 enum mlx5_device_state {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 06/14] net/mlx5: SF, Add auxiliary device support
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (4 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 05/14] net/mlx5: Introduce vhca state event notifier saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 07/14] net/mlx5: SF, Add auxiliary device driver saeed
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Introduce API to add and delete an auxiliary device for an SF.
Each SF has its own dedicated window in the PCI BAR 2.

SF device is similar to PCI PF and VF that supports multiple class of
devices such as net, rdma and vdpa.

SF device will be added or removed in subsequent patch during SF
devlink port function state change command.

A subfunction device exposes user supplied subfunction number which will
be further used by systemd/udev to have deterministic name for its
netdevice and rdma device.

An mlx5 subfunction auxiliary device example:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active

On activation,

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v1->v2:
 - uses vhca event to add/remove sf device
---
 .../device_drivers/ethernet/mellanox/mlx5.rst |   5 +
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    |   4 +
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.c  | 261 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.h  |  35 +++
 include/linux/mlx5/driver.h                   |   2 +
 6 files changed, 308 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
index e9b65035cd47..a5eb22793bb9 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
@@ -97,6 +97,11 @@ Enabling the driver and kconfig options
 
 |   Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support.
 
+**CONFIG_MLX5_SF=(y/n)**
+
+|   Build support for subfunction.
+|   Subfunctons are more light weight than PCI SRIOV VFs. Choosing this option
+|   will enable support for creating subfunction devices.
 
 **External options** ( Choose if the corresponding mlx5 feature is required )
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 3abb286dbbea..2f9dede697dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -88,4 +88,4 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o steering/dr_table.o
 #
 # SF device
 #
-mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o
+mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 6e67ad11c713..292c30e71d7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -74,6 +74,7 @@
 #include "lib/hv_vhca.h"
 #include "diag/rsc_dump.h"
 #include "sf/vhca_event.h"
+#include "sf/dev/dev.h"
 
 MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
 MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
@@ -1155,6 +1156,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 		goto err_sriov;
 	}
 
+	mlx5_sf_dev_table_create(dev);
+
 	return 0;
 
 err_sriov:
@@ -1186,6 +1189,7 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 
 static void mlx5_unload(struct mlx5_core_dev *dev)
 {
+	mlx5_sf_dev_table_destroy(dev);
 	mlx5_sriov_detach(dev);
 	mlx5_ec_cleanup(dev);
 	mlx5_vhca_event_stop(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
new file mode 100644
index 000000000000..6562bf63afaa
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
@@ -0,0 +1,261 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/device.h>
+#include "mlx5_core.h"
+#include "dev.h"
+#include "sf/vhca_event.h"
+#include "sf/sf.h"
+#include "sf/mlx5_ifc_vhca_event.h"
+#include "ecpf.h"
+
+struct mlx5_sf_dev_table {
+	struct xarray devices;
+	unsigned int max_sfs;
+	phys_addr_t base_address;
+	u64 sf_bar_length;
+	struct notifier_block nb;
+	struct mlx5_core_dev *dev;
+};
+
+static bool mlx5_sf_dev_supported(const struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, sf) && mlx5_vhca_event_supported(dev);
+}
+
+static ssize_t sfnum_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct auxiliary_device *adev = container_of(dev, struct auxiliary_device, dev);
+	struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", sf_dev->sfnum);
+}
+static DEVICE_ATTR_RO(sfnum);
+
+static struct attribute *sf_device_attrs[] = {
+	&dev_attr_sfnum.attr,
+	NULL,
+};
+
+static const struct attribute_group sf_attr_group = {
+	.attrs = sf_device_attrs,
+};
+
+static const struct attribute_group *sf_attr_groups[2] = {
+	&sf_attr_group,
+	NULL
+};
+
+static void mlx5_sf_dev_release(struct device *device)
+{
+	struct auxiliary_device *adev = container_of(device, struct auxiliary_device, dev);
+	struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
+
+	mlx5_adev_idx_free(adev->id);
+	kfree(sf_dev);
+}
+
+static void mlx5_sf_dev_remove(struct mlx5_sf_dev *sf_dev)
+{
+	auxiliary_device_delete(&sf_dev->adev);
+	auxiliary_device_uninit(&sf_dev->adev);
+}
+
+static void mlx5_sf_dev_add(struct mlx5_core_dev *dev, u16 sf_index, u32 sfnum)
+{
+	struct mlx5_sf_dev_table *table = dev->priv.sf_dev_table;
+	struct mlx5_sf_dev *sf_dev;
+	struct pci_dev *pdev;
+	int err;
+	int id;
+
+	id = mlx5_adev_idx_alloc();
+	if (id < 0) {
+		err = id;
+		goto add_err;
+	}
+
+	sf_dev = kzalloc(sizeof(*sf_dev), GFP_KERNEL);
+	if (!sf_dev) {
+		mlx5_adev_idx_free(id);
+		err = -ENOMEM;
+		goto add_err;
+	}
+	pdev = dev->pdev;
+	sf_dev->adev.id = id;
+	sf_dev->adev.name = MLX5_SF_DEV_ID_NAME;
+	sf_dev->adev.dev.release = mlx5_sf_dev_release;
+	sf_dev->adev.dev.parent = &pdev->dev;
+	sf_dev->adev.dev.groups = sf_attr_groups;
+	sf_dev->sfnum = sfnum;
+	sf_dev->parent_mdev = dev;
+
+	if (!table->max_sfs) {
+		mlx5_adev_idx_free(id);
+		kfree(sf_dev);
+		err = -EOPNOTSUPP;
+		goto add_err;
+	}
+	sf_dev->bar_base_addr = table->base_address + (sf_index * table->sf_bar_length);
+
+	err = auxiliary_device_init(&sf_dev->adev);
+	if (err) {
+		mlx5_adev_idx_free(id);
+		kfree(sf_dev);
+		goto add_err;
+	}
+
+	err = auxiliary_device_add(&sf_dev->adev);
+	if (err) {
+		put_device(&sf_dev->adev.dev);
+		goto add_err;
+	}
+
+	err = xa_insert(&table->devices, sf_index, sf_dev, GFP_KERNEL);
+	if (err)
+		goto xa_err;
+	return;
+
+xa_err:
+	mlx5_sf_dev_remove(sf_dev);
+add_err:
+	mlx5_core_err(dev, "SF DEV: fail device add for index=%d sfnum=%d err=%d\n",
+		      sf_index, sfnum, err);
+}
+
+static void mlx5_sf_dev_del(struct mlx5_core_dev *dev, struct mlx5_sf_dev *sf_dev, u16 sf_index)
+{
+	struct mlx5_sf_dev_table *table = dev->priv.sf_dev_table;
+
+	xa_erase(&table->devices, sf_index);
+	mlx5_sf_dev_remove(sf_dev);
+}
+
+static int
+mlx5_sf_dev_state_change_handler(struct notifier_block *nb, unsigned long event_code, void *data)
+{
+	struct mlx5_sf_dev_table *table = container_of(nb, struct mlx5_sf_dev_table, nb);
+	const struct mlx5_vhca_state_event *event = data;
+	struct mlx5_sf_dev *sf_dev;
+	u16 sf_index;
+
+	sf_index = event->function_id - MLX5_CAP_GEN(table->dev, sf_base_id);
+	sf_dev = xa_load(&table->devices, sf_index);
+	switch (event->new_vhca_state) {
+	case MLX5_VHCA_STATE_TEARDOWN_REQUEST:
+		if (sf_dev)
+			mlx5_sf_dev_del(table->dev, sf_dev, sf_index);
+		else
+			mlx5_core_err(table->dev,
+				      "SF DEV: teardown state for invalid dev index=%d fn_id=0x%x\n",
+				      sf_index, event->sw_function_id);
+		break;
+	case MLX5_VHCA_STATE_ACTIVE:
+		if (!sf_dev)
+			mlx5_sf_dev_add(table->dev, sf_index, event->sw_function_id);
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static int mlx5_sf_dev_vhca_arm_all(struct mlx5_sf_dev_table *table)
+{
+	struct mlx5_core_dev *dev = table->dev;
+	u16 max_functions;
+	u16 function_id;
+	int err = 0;
+	bool ecpu;
+	int i;
+
+	max_functions = mlx5_sf_max_functions(dev);
+	function_id = MLX5_CAP_GEN(dev, sf_base_id);
+	ecpu = mlx5_read_embedded_cpu(dev);
+	/* Arm the vhca context as the vhca event notifier */
+	for (i = 0; i < max_functions; i++) {
+		err = mlx5_vhca_event_arm(dev, function_id, ecpu);
+		if (err)
+			return err;
+
+		function_id++;
+	}
+	return 0;
+}
+
+void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_dev_table *table;
+	unsigned int max_sfs;
+	int err;
+
+	if (!mlx5_sf_dev_supported(dev) || !mlx5_vhca_event_supported(dev))
+		return;
+
+	table = kzalloc(sizeof(*table), GFP_KERNEL);
+	if (!table) {
+		err = -ENOMEM;
+		goto table_err;
+	}
+
+	table->nb.notifier_call = mlx5_sf_dev_state_change_handler;
+	table->dev = dev;
+	if (MLX5_CAP_GEN(dev, max_num_sf))
+		max_sfs = MLX5_CAP_GEN(dev, max_num_sf);
+	else
+		max_sfs = 1 << MLX5_CAP_GEN(dev, log_max_sf);
+	table->sf_bar_length = 1 << (MLX5_CAP_GEN(dev, log_min_sf_size) + 12);
+	table->base_address = pci_resource_start(dev->pdev, 2);
+	table->max_sfs = max_sfs;
+	xa_init(&table->devices);
+	dev->priv.sf_dev_table = table;
+
+	err = mlx5_vhca_event_notifier_register(dev, &table->nb);
+	if (err)
+		goto vhca_err;
+	err = mlx5_sf_dev_vhca_arm_all(table);
+	if (err)
+		goto arm_err;
+	mlx5_core_dbg(dev, "SF DEV: max sf devices=%d\n", max_sfs);
+	return;
+
+arm_err:
+	mlx5_vhca_event_notifier_unregister(dev, &table->nb);
+vhca_err:
+	table->max_sfs = 0;
+	kfree(table);
+	dev->priv.sf_dev_table = NULL;
+table_err:
+	mlx5_core_err(dev, "SF DEV table create err = %d\n", err);
+}
+
+static void mlx5_sf_dev_destroy_all(struct mlx5_sf_dev_table *table)
+{
+	struct mlx5_sf_dev *sf_dev;
+	unsigned long index;
+
+	xa_for_each(&table->devices, index, sf_dev) {
+		xa_erase(&table->devices, index);
+		mlx5_sf_dev_remove(sf_dev);
+	}
+}
+
+void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_dev_table *table = dev->priv.sf_dev_table;
+
+	if (!table)
+		return;
+
+	mlx5_vhca_event_notifier_unregister(dev, &table->nb);
+
+	/* Now that event handler is not running, it is safe to destroy
+	 * the sf device without race.
+	 */
+	mlx5_sf_dev_destroy_all(table);
+
+	WARN_ON(!xa_empty(&table->devices));
+	kfree(table);
+	dev->priv.sf_dev_table = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h
new file mode 100644
index 000000000000..a6fb7289ba2c
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#ifndef __MLX5_SF_DEV_H__
+#define __MLX5_SF_DEV_H__
+
+#ifdef CONFIG_MLX5_SF
+
+#include <linux/auxiliary_bus.h>
+
+#define MLX5_SF_DEV_ID_NAME "sf"
+
+struct mlx5_sf_dev {
+	struct auxiliary_device adev;
+	struct mlx5_core_dev *parent_mdev;
+	phys_addr_t bar_base_addr;
+	u32 sfnum;
+};
+
+void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev);
+void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev);
+
+#else
+
+static inline void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev)
+{
+}
+
+static inline void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev)
+{
+}
+
+#endif
+
+#endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index ffba0786051e..08e5fbe97df0 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -508,6 +508,7 @@ struct mlx5_fw_reset;
 struct mlx5_eq_table;
 struct mlx5_irq_table;
 struct mlx5_vhca_state_notifier;
+struct mlx5_sf_dev_table;
 
 struct mlx5_rate_limit {
 	u32			rate;
@@ -606,6 +607,7 @@ struct mlx5_priv {
 	struct mlx5_uars_page	       *uar;
 #ifdef CONFIG_MLX5_SF
 	struct mlx5_vhca_state_notifier *vhca_state_notifier;
+	struct mlx5_sf_dev_table *sf_dev_table;
 #endif
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 07/14] net/mlx5: SF, Add auxiliary device driver
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (5 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 06/14] net/mlx5: SF, Add auxiliary device support saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 08/14] net/mlx5: E-switch, Prepare eswitch to handle SF vport saeed
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Add auxiliary device driver for mlx5 subfunction auxiliary device.

A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.

As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false

$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88

$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v1->v2:
 - auxiliary driver remove method return type changed from int to void
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  12 +++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    |  12 ++-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  10 ++
 .../net/ethernet/mellanox/mlx5/core/pci_irq.c |  20 ++++
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.c  |  10 ++
 .../ethernet/mellanox/mlx5/core/sf/dev/dev.h  |  20 ++++
 .../mellanox/mlx5/core/sf/dev/driver.c        | 101 ++++++++++++++++++
 include/linux/mlx5/driver.h                   |   4 +-
 10 files changed, 187 insertions(+), 6 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 2f9dede697dc..fbbd08cb70c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -88,4 +88,4 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o steering/dr_table.o
 #
 # SF device
 #
-mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o
+mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o sf/dev/driver.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 3261d0dc1104..9afe918c5827 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -7,6 +7,7 @@
 #include "fw_reset.h"
 #include "fs_core.h"
 #include "eswitch.h"
+#include "sf/dev/dev.h"
 
 static int mlx5_devlink_flash_update(struct devlink *devlink,
 				     struct devlink_flash_update_params *params,
@@ -127,6 +128,17 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
 				    struct netlink_ext_ack *extack)
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	bool sf_dev_allocated;
+
+	sf_dev_allocated = mlx5_sf_dev_allocated(dev);
+	if (sf_dev_allocated) {
+		/* Reload results in deleting SF device which further results in
+		 * unregistering devlink instance while holding devlink_mutext.
+		 * Hence, do not support reload.
+		 */
+		NL_SET_ERR_MSG_MOD(extack, "reload is unsupported when SFs are allocated\n");
+		return -EOPNOTSUPP;
+	}
 
 	switch (action) {
 	case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index f3e1d5b785ce..eba4a9bb691a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -469,7 +469,7 @@ int mlx5_eq_table_init(struct mlx5_core_dev *dev)
 	for (i = 0; i < MLX5_EVENT_TYPE_MAX; i++)
 		ATOMIC_INIT_NOTIFIER_HEAD(&eq_table->nh[i]);
 
-	eq_table->irq_table = dev->priv.irq_table;
+	eq_table->irq_table = mlx5_irq_table_get(dev);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 292c30e71d7f..932a280a56a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -84,7 +84,6 @@ unsigned int mlx5_core_debug_mask;
 module_param_named(debug_mask, mlx5_core_debug_mask, uint, 0644);
 MODULE_PARM_DESC(debug_mask, "debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0");
 
-#define MLX5_DEFAULT_PROF	2
 static unsigned int prof_sel = MLX5_DEFAULT_PROF;
 module_param_named(prof_sel, prof_sel, uint, 0444);
 MODULE_PARM_DESC(prof_sel, "profile selector. Valid range 0 - 2");
@@ -1303,7 +1302,7 @@ void mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup)
 	mutex_unlock(&dev->intf_state_mutex);
 }
 
-static int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
+int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
 {
 	struct mlx5_priv *priv = &dev->priv;
 	int err;
@@ -1353,7 +1352,7 @@ static int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx)
 	return err;
 }
 
-static void mlx5_mdev_uninit(struct mlx5_core_dev *dev)
+void mlx5_mdev_uninit(struct mlx5_core_dev *dev)
 {
 	struct mlx5_priv *priv = &dev->priv;
 
@@ -1693,6 +1692,10 @@ static int __init init(void)
 	if (err)
 		goto err_debug;
 
+	err = mlx5_sf_driver_register();
+	if (err)
+		goto err_sf;
+
 #ifdef CONFIG_MLX5_CORE_EN
 	err = mlx5e_init();
 	if (err) {
@@ -1703,6 +1706,8 @@ static int __init init(void)
 
 	return 0;
 
+err_sf:
+	pci_unregister_driver(&mlx5_core_driver);
 err_debug:
 	mlx5_unregister_debugfs();
 	return err;
@@ -1713,6 +1718,7 @@ static void __exit cleanup(void)
 #ifdef CONFIG_MLX5_CORE_EN
 	mlx5e_cleanup();
 #endif
+	mlx5_sf_driver_unregister();
 	pci_unregister_driver(&mlx5_core_driver);
 	mlx5_unregister_debugfs();
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index a33b7496d748..3754ef98554f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -117,6 +117,8 @@ enum mlx5_semaphore_space_address {
 	MLX5_SEMAPHORE_SW_RESET         = 0x20,
 };
 
+#define MLX5_DEFAULT_PROF       2
+
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
 int mlx5_query_board_id(struct mlx5_core_dev *dev);
 int mlx5_cmd_init(struct mlx5_core_dev *dev);
@@ -176,6 +178,7 @@ struct cpumask *
 mlx5_irq_get_affinity_mask(struct mlx5_irq_table *irq_table, int vecidx);
 struct cpu_rmap *mlx5_irq_get_rmap(struct mlx5_irq_table *table);
 int mlx5_irq_get_num_comp(struct mlx5_irq_table *table);
+struct mlx5_irq_table *mlx5_irq_table_get(struct mlx5_core_dev *dev);
 
 int mlx5_events_init(struct mlx5_core_dev *dev);
 void mlx5_events_cleanup(struct mlx5_core_dev *dev);
@@ -257,6 +260,13 @@ enum {
 u8 mlx5_get_nic_state(struct mlx5_core_dev *dev);
 void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state);
 
+static inline bool mlx5_core_is_sf(const struct mlx5_core_dev *dev)
+{
+	return dev->coredev_type == MLX5_COREDEV_SF;
+}
+
+int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx);
+void mlx5_mdev_uninit(struct mlx5_core_dev *dev);
 void mlx5_unload_one(struct mlx5_core_dev *dev, bool cleanup);
 int mlx5_load_one(struct mlx5_core_dev *dev, bool boot);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index 6fd974920394..a61e09aff152 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -30,6 +30,9 @@ int mlx5_irq_table_init(struct mlx5_core_dev *dev)
 {
 	struct mlx5_irq_table *irq_table;
 
+	if (mlx5_core_is_sf(dev))
+		return 0;
+
 	irq_table = kvzalloc(sizeof(*irq_table), GFP_KERNEL);
 	if (!irq_table)
 		return -ENOMEM;
@@ -40,6 +43,9 @@ int mlx5_irq_table_init(struct mlx5_core_dev *dev)
 
 void mlx5_irq_table_cleanup(struct mlx5_core_dev *dev)
 {
+	if (mlx5_core_is_sf(dev))
+		return;
+
 	kvfree(dev->priv.irq_table);
 }
 
@@ -268,6 +274,9 @@ int mlx5_irq_table_create(struct mlx5_core_dev *dev)
 	int nvec;
 	int err;
 
+	if (mlx5_core_is_sf(dev))
+		return 0;
+
 	nvec = MLX5_CAP_GEN(dev, num_ports) * num_online_cpus() +
 	       MLX5_IRQ_VEC_COMP_BASE;
 	nvec = min_t(int, nvec, num_eqs);
@@ -319,6 +328,9 @@ void mlx5_irq_table_destroy(struct mlx5_core_dev *dev)
 	struct mlx5_irq_table *table = dev->priv.irq_table;
 	int i;
 
+	if (mlx5_core_is_sf(dev))
+		return;
+
 	/* free_irq requires that affinity and rmap will be cleared
 	 * before calling it. This is why there is asymmetry with set_rmap
 	 * which should be called after alloc_irq but before request_irq.
@@ -332,3 +344,11 @@ void mlx5_irq_table_destroy(struct mlx5_core_dev *dev)
 	kfree(table->irq);
 }
 
+struct mlx5_irq_table *mlx5_irq_table_get(struct mlx5_core_dev *dev)
+{
+#ifdef CONFIG_MLX5_SF
+	if (mlx5_core_is_sf(dev))
+		return dev->priv.parent_mdev->priv.irq_table;
+#endif
+	return dev->priv.irq_table;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
index 6562bf63afaa..2675b85d202d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
@@ -24,6 +24,16 @@ static bool mlx5_sf_dev_supported(const struct mlx5_core_dev *dev)
 	return MLX5_CAP_GEN(dev, sf) && mlx5_vhca_event_supported(dev);
 }
 
+bool mlx5_sf_dev_allocated(const struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_dev_table *table = dev->priv.sf_dev_table;
+
+	if (!mlx5_sf_dev_supported(dev))
+		return false;
+
+	return xa_empty(&table->devices);
+}
+
 static ssize_t sfnum_show(struct device *dev, struct device_attribute *attr, char *buf)
 {
 	struct auxiliary_device *adev = container_of(dev, struct auxiliary_device, dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h
index a6fb7289ba2c..4de02902aef1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h
@@ -13,6 +13,7 @@
 struct mlx5_sf_dev {
 	struct auxiliary_device adev;
 	struct mlx5_core_dev *parent_mdev;
+	struct mlx5_core_dev *mdev;
 	phys_addr_t bar_base_addr;
 	u32 sfnum;
 };
@@ -20,6 +21,11 @@ struct mlx5_sf_dev {
 void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev);
 void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev);
 
+int mlx5_sf_driver_register(void);
+void mlx5_sf_driver_unregister(void);
+
+bool mlx5_sf_dev_allocated(const struct mlx5_core_dev *dev);
+
 #else
 
 static inline void mlx5_sf_dev_table_create(struct mlx5_core_dev *dev)
@@ -30,6 +36,20 @@ static inline void mlx5_sf_dev_table_destroy(struct mlx5_core_dev *dev)
 {
 }
 
+static inline int mlx5_sf_driver_register(void)
+{
+	return 0;
+}
+
+static inline void mlx5_sf_driver_unregister(void)
+{
+}
+
+static inline bool mlx5_sf_dev_allocated(const struct mlx5_core_dev *dev)
+{
+	return 0;
+}
+
 #endif
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
new file mode 100644
index 000000000000..9a1ad331ce0a
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/device.h>
+#include "mlx5_core.h"
+#include "dev.h"
+#include "devlink.h"
+
+static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
+{
+	struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
+	struct mlx5_core_dev *mdev;
+	struct devlink *devlink;
+	int err;
+
+	devlink = mlx5_devlink_alloc();
+	if (!devlink)
+		return -ENOMEM;
+
+	mdev = devlink_priv(devlink);
+	mdev->device = &adev->dev;
+	mdev->pdev = sf_dev->parent_mdev->pdev;
+	mdev->bar_addr = sf_dev->bar_base_addr;
+	mdev->iseg_base = sf_dev->bar_base_addr;
+	mdev->coredev_type = MLX5_COREDEV_SF;
+	mdev->priv.parent_mdev = sf_dev->parent_mdev;
+	mdev->priv.adev_idx = adev->id;
+	sf_dev->mdev = mdev;
+
+	err = mlx5_mdev_init(mdev, MLX5_DEFAULT_PROF);
+	if (err) {
+		mlx5_core_warn(mdev, "mlx5_mdev_init on err=%d\n", err);
+		goto mdev_err;
+	}
+
+	mdev->iseg = ioremap(mdev->iseg_base, sizeof(*mdev->iseg));
+	if (!mdev->iseg) {
+		mlx5_core_warn(mdev, "remap error\n");
+		goto remap_err;
+	}
+
+	err = mlx5_load_one(mdev, true);
+	if (err) {
+		mlx5_core_warn(mdev, "mlx5_load_one err=%d\n", err);
+		goto load_one_err;
+	}
+	return 0;
+
+load_one_err:
+	iounmap(mdev->iseg);
+remap_err:
+	mlx5_mdev_uninit(mdev);
+mdev_err:
+	mlx5_devlink_free(devlink);
+	return err;
+}
+
+static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
+{
+	struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
+	struct devlink *devlink;
+
+	devlink = priv_to_devlink(sf_dev->mdev);
+	mlx5_unload_one(sf_dev->mdev, true);
+	iounmap(sf_dev->mdev->iseg);
+	mlx5_mdev_uninit(sf_dev->mdev);
+	mlx5_devlink_free(devlink);
+}
+
+static void mlx5_sf_dev_shutdown(struct auxiliary_device *adev)
+{
+	struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
+
+	mlx5_unload_one(sf_dev->mdev, false);
+}
+
+static const struct auxiliary_device_id mlx5_sf_dev_id_table[] = {
+	{ .name = KBUILD_MODNAME "." MLX5_SF_DEV_ID_NAME, },
+	{ },
+};
+
+MODULE_DEVICE_TABLE(auxiliary, mlx5_sf_dev_id_table);
+
+static struct auxiliary_driver mlx5_sf_driver = {
+	.name = KBUILD_MODNAME,
+	.probe = mlx5_sf_dev_probe,
+	.remove = mlx5_sf_dev_remove,
+	.shutdown = mlx5_sf_dev_shutdown,
+	.id_table = mlx5_sf_dev_id_table,
+};
+
+int mlx5_sf_driver_register(void)
+{
+	return auxiliary_driver_register(&mlx5_sf_driver);
+}
+
+void mlx5_sf_driver_unregister(void)
+{
+	auxiliary_driver_unregister(&mlx5_sf_driver);
+}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 08e5fbe97df0..48e3638b1185 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -193,7 +193,8 @@ enum port_state_policy {
 
 enum mlx5_coredev_type {
 	MLX5_COREDEV_PF,
-	MLX5_COREDEV_VF
+	MLX5_COREDEV_VF,
+	MLX5_COREDEV_SF,
 };
 
 struct mlx5_field_desc {
@@ -608,6 +609,7 @@ struct mlx5_priv {
 #ifdef CONFIG_MLX5_SF
 	struct mlx5_vhca_state_notifier *vhca_state_notifier;
 	struct mlx5_sf_dev_table *sf_dev_table;
+	struct mlx5_core_dev *parent_mdev;
 #endif
 };
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 08/14] net/mlx5: E-switch, Prepare eswitch to handle SF vport
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (6 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 07/14] net/mlx5: SF, Add auxiliary device driver saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 09/14] net/mlx5: E-switch, Add eswitch helpers for " saeed
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Vu Pham, Parav Pandit, Roi Dayan, Saeed Mahameed

From: Vu Pham <vuhuong@nvidia.com>

Prepare eswitch to handle SF vport during
(a) querying eswitch functions
(b) egress ACL creation
(c) account for SF vports in total vports calculation

Assign a dedicated placeholder for SFs vports and their representors.
They are placed after VFs vports and before ECPF vports as below:
[PF,VF0,...,VFn,SF0,...SFm,ECPF,UPLINK].

Change functions to map SF's vport numbers to indices when
accessing the vports or representors arrays, and vice versa.

Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/Kconfig   | 10 ++++
 .../mellanox/mlx5/core/esw/acl/egress_ofld.c  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 11 +++-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h | 50 +++++++++++++++++++
 .../mellanox/mlx5/core/eswitch_offloads.c     | 11 ++++
 .../net/ethernet/mellanox/mlx5/core/vport.c   |  3 +-
 6 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 10dfaf671c90..11d5e0e99bd6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -211,3 +211,13 @@ config MLX5_SF
 	Build support for subfuction device in the NIC. A Mellanox subfunction
 	device can support RDMA, netdevice and vdpa device.
 	It is similar to a SRIOV VF but it doesn't require SRIOV support.
+
+config MLX5_SF_MANAGER
+	bool
+	depends on MLX5_SF && MLX5_ESWITCH
+	default y
+	help
+	Build support for subfuction port in the NIC. A Mellanox subfunction
+	port is managed through devlink.  A subfunction supports RDMA, netdevice
+	and vdpa device. It is similar to a SRIOV VF but it doesn't require
+	SRIOV support.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
index c3faae67e4d6..45758ff3c14e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_ofld.c
@@ -150,7 +150,7 @@ static void esw_acl_egress_ofld_groups_destroy(struct mlx5_vport *vport)
 
 static bool esw_acl_egress_needed(const struct mlx5_eswitch *esw, u16 vport_num)
 {
-	return mlx5_eswitch_is_vf_vport(esw, vport_num);
+	return mlx5_eswitch_is_vf_vport(esw, vport_num) || mlx5_esw_is_sf_vport(esw, vport_num);
 }
 
 int esw_acl_egress_ofld_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index da901e364656..d75247a8ce55 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1366,9 +1366,15 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev)
 {
 	int outlen = MLX5_ST_SZ_BYTES(query_esw_functions_out);
 	u32 in[MLX5_ST_SZ_DW(query_esw_functions_in)] = {};
+	u16 max_sf_vports;
 	u32 *out;
 	int err;
 
+	max_sf_vports = mlx5_sf_max_functions(dev);
+	/* Device interface is array of 64-bits */
+	if (max_sf_vports)
+		outlen += DIV_ROUND_UP(max_sf_vports, BITS_PER_TYPE(__be64)) * sizeof(__be64);
+
 	out = kvzalloc(outlen, GFP_KERNEL);
 	if (!out)
 		return ERR_PTR(-ENOMEM);
@@ -1376,7 +1382,7 @@ const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev)
 	MLX5_SET(query_esw_functions_in, in, opcode,
 		 MLX5_CMD_OP_QUERY_ESW_FUNCTIONS);
 
-	err = mlx5_cmd_exec_inout(dev, query_esw_functions, in, out);
+	err = mlx5_cmd_exec(dev, in, sizeof(in), out, outlen);
 	if (!err)
 		return out;
 
@@ -1899,7 +1905,8 @@ static bool
 is_port_function_supported(const struct mlx5_eswitch *esw, u16 vport_num)
 {
 	return vport_num == MLX5_VPORT_PF ||
-	       mlx5_eswitch_is_vf_vport(esw, vport_num);
+	       mlx5_eswitch_is_vf_vport(esw, vport_num) ||
+	       mlx5_esw_is_sf_vport(esw, vport_num);
 }
 
 int mlx5_devlink_port_function_hw_addr_get(struct devlink *devlink,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index cf87de94418f..4e3ed878ff03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -43,6 +43,7 @@
 #include <linux/mlx5/fs.h>
 #include "lib/mpfs.h"
 #include "lib/fs_chains.h"
+#include "sf/sf.h"
 #include "en/tc_ct.h"
 
 #ifdef CONFIG_MLX5_ESWITCH
@@ -499,6 +500,40 @@ static inline u16 mlx5_eswitch_first_host_vport_num(struct mlx5_core_dev *dev)
 		MLX5_VPORT_PF : MLX5_VPORT_FIRST_VF;
 }
 
+static inline int mlx5_esw_sf_start_idx(const struct mlx5_eswitch *esw)
+{
+	/* PF and VF vports indices start from 0 to max_vfs */
+	return MLX5_VPORT_PF_PLACEHOLDER + mlx5_core_max_vfs(esw->dev);
+}
+
+static inline int mlx5_esw_sf_end_idx(const struct mlx5_eswitch *esw)
+{
+	return mlx5_esw_sf_start_idx(esw) + mlx5_sf_max_functions(esw->dev);
+}
+
+static inline int
+mlx5_esw_sf_vport_num_to_index(const struct mlx5_eswitch *esw, u16 vport_num)
+{
+	return vport_num - mlx5_sf_start_function_id(esw->dev) +
+	       MLX5_VPORT_PF_PLACEHOLDER + mlx5_core_max_vfs(esw->dev);
+}
+
+static inline u16
+mlx5_esw_sf_vport_index_to_num(const struct mlx5_eswitch *esw, int idx)
+{
+	return mlx5_sf_start_function_id(esw->dev) + idx -
+	       (MLX5_VPORT_PF_PLACEHOLDER + mlx5_core_max_vfs(esw->dev));
+}
+
+static inline bool
+mlx5_esw_is_sf_vport(const struct mlx5_eswitch *esw, u16 vport_num)
+{
+	return mlx5_sf_supported(esw->dev) &&
+	       vport_num >= mlx5_sf_start_function_id(esw->dev) &&
+	       (vport_num < (mlx5_sf_start_function_id(esw->dev) +
+			     mlx5_sf_max_functions(esw->dev)));
+}
+
 static inline bool mlx5_eswitch_is_funcs_handler(const struct mlx5_core_dev *dev)
 {
 	return mlx5_core_is_ecpf_esw_manager(dev);
@@ -527,6 +562,10 @@ static inline int mlx5_eswitch_vport_num_to_index(struct mlx5_eswitch *esw,
 	if (vport_num == MLX5_VPORT_UPLINK)
 		return mlx5_eswitch_uplink_idx(esw);
 
+	if (mlx5_esw_is_sf_vport(esw, vport_num))
+		return mlx5_esw_sf_vport_num_to_index(esw, vport_num);
+
+	/* PF and VF vports start from 0 to max_vfs */
 	return vport_num;
 }
 
@@ -540,6 +579,12 @@ static inline u16 mlx5_eswitch_index_to_vport_num(struct mlx5_eswitch *esw,
 	if (index == mlx5_eswitch_uplink_idx(esw))
 		return MLX5_VPORT_UPLINK;
 
+	/* SF vports indices are after VFs and before ECPF */
+	if (mlx5_sf_supported(esw->dev) &&
+	    index > mlx5_core_max_vfs(esw->dev))
+		return mlx5_esw_sf_vport_index_to_num(esw, index);
+
+	/* PF and VF vports start from 0 to max_vfs */
 	return index;
 }
 
@@ -625,6 +670,11 @@ void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw);
 	for ((vport) = (nvfs);						\
 	     (vport) >= (esw)->first_host_vport; (vport)--)
 
+#define mlx5_esw_for_each_sf_rep(esw, i, rep)		\
+	for ((i) = mlx5_esw_sf_start_idx(esw);		\
+	     (rep) = &(esw)->offloads.vport_reps[(i)],	\
+	     (i) < mlx5_esw_sf_end_idx(esw); (i++))
+
 struct mlx5_eswitch *mlx5_devlink_eswitch_get(struct devlink *devlink);
 struct mlx5_vport *__must_check
 mlx5_eswitch_get_vport(struct mlx5_eswitch *esw, u16 vport_num);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index c9c2962ad49f..f6c8dae739c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1801,11 +1801,22 @@ static void __esw_offloads_unload_rep(struct mlx5_eswitch *esw,
 		esw->offloads.rep_ops[rep_type]->unload(rep);
 }
 
+static void __unload_reps_sf_vport(struct mlx5_eswitch *esw, u8 rep_type)
+{
+	struct mlx5_eswitch_rep *rep;
+	int i;
+
+	mlx5_esw_for_each_sf_rep(esw, i, rep)
+		__esw_offloads_unload_rep(esw, rep, rep_type);
+}
+
 static void __unload_reps_all_vport(struct mlx5_eswitch *esw, u8 rep_type)
 {
 	struct mlx5_eswitch_rep *rep;
 	int i;
 
+	__unload_reps_sf_vport(esw, rep_type);
+
 	mlx5_esw_for_each_vf_rep_reverse(esw, i, rep, esw->esw_funcs.num_vfs)
 		__esw_offloads_unload_rep(esw, rep, rep_type);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index bdafc85fd874..ba78e0660523 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -36,6 +36,7 @@
 #include <linux/mlx5/vport.h>
 #include <linux/mlx5/eswitch.h>
 #include "mlx5_core.h"
+#include "sf/sf.h"
 
 /* Mutex to hold while enabling or disabling RoCE */
 static DEFINE_MUTEX(mlx5_roce_en_lock);
@@ -1160,6 +1161,6 @@ EXPORT_SYMBOL_GPL(mlx5_query_nic_system_image_guid);
  */
 u16 mlx5_eswitch_get_total_vports(const struct mlx5_core_dev *dev)
 {
-	return MLX5_SPECIAL_VPORTS(dev) + mlx5_core_max_vfs(dev);
+	return MLX5_SPECIAL_VPORTS(dev) + mlx5_core_max_vfs(dev) + mlx5_sf_max_functions(dev);
 }
 EXPORT_SYMBOL_GPL(mlx5_eswitch_get_total_vports);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 09/14] net/mlx5: E-switch, Add eswitch helpers for SF vport
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (7 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 08/14] net/mlx5: E-switch, Prepare eswitch to handle SF vport saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 10/14] net/mlx5: SF, Add port add delete functionality saeed
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Vu Pham, Roi Dayan, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Add helpers to enable/disable eswitch port, register its devlink port and
load its representor.

Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v1->v2:
 - cut down extra api for vport enable/disable
---
 .../mellanox/mlx5/core/esw/devlink_port.c     | 41 +++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 12 +++---
 .../net/ethernet/mellanox/mlx5/core/eswitch.h | 16 ++++++++
 .../mellanox/mlx5/core/eswitch_offloads.c     | 36 +++++++++++++++-
 4 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index ffff11baa3d0..4b7e9f783789 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -122,3 +122,44 @@ struct devlink_port *mlx5_esw_offloads_devlink_port(struct mlx5_eswitch *esw, u1
 	vport = mlx5_eswitch_get_vport(esw, vport_num);
 	return vport->dl_port;
 }
+
+int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct devlink_port *dl_port,
+				      u16 vport_num, u32 sfnum)
+{
+	struct mlx5_core_dev *dev = esw->dev;
+	struct netdev_phys_item_id ppid = {};
+	unsigned int dl_port_index;
+	struct mlx5_vport *vport;
+	struct devlink *devlink;
+	u16 pfnum;
+	int err;
+
+	vport = mlx5_eswitch_get_vport(esw, vport_num);
+	if (IS_ERR(vport))
+		return PTR_ERR(vport);
+
+	pfnum = PCI_FUNC(dev->pdev->devfn);
+	mlx5_esw_get_port_parent_id(dev, &ppid);
+	memcpy(dl_port->attrs.switch_id.id, &ppid.id[0], ppid.id_len);
+	dl_port->attrs.switch_id.id_len = ppid.id_len;
+	devlink_port_attrs_pci_sf_set(dl_port, 0, pfnum, sfnum, false);
+	devlink = priv_to_devlink(dev);
+	dl_port_index = mlx5_esw_vport_to_devlink_port_index(dev, vport_num);
+	err = devlink_port_register(devlink, dl_port, dl_port_index);
+	if (err)
+		return err;
+
+	vport->dl_port = dl_port;
+	return 0;
+}
+
+void mlx5_esw_devlink_sf_port_unregister(struct mlx5_eswitch *esw, u16 vport_num)
+{
+	struct mlx5_vport *vport;
+
+	vport = mlx5_eswitch_get_vport(esw, vport_num);
+	if (IS_ERR(vport))
+		return;
+	devlink_port_unregister(vport->dl_port);
+	vport->dl_port = NULL;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index d75247a8ce55..d06e7a5f15de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1273,8 +1273,8 @@ static void esw_vport_cleanup(struct mlx5_eswitch *esw, struct mlx5_vport *vport
 	esw_vport_cleanup_acl(esw, vport);
 }
 
-static int esw_enable_vport(struct mlx5_eswitch *esw, u16 vport_num,
-			    enum mlx5_eswitch_vport_event enabled_events)
+int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, u16 vport_num,
+			  enum mlx5_eswitch_vport_event enabled_events)
 {
 	struct mlx5_vport *vport;
 	int ret;
@@ -1310,7 +1310,7 @@ static int esw_enable_vport(struct mlx5_eswitch *esw, u16 vport_num,
 	return ret;
 }
 
-static void esw_disable_vport(struct mlx5_eswitch *esw, u16 vport_num)
+void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, u16 vport_num)
 {
 	struct mlx5_vport *vport;
 
@@ -1432,7 +1432,7 @@ int mlx5_eswitch_load_vport(struct mlx5_eswitch *esw, u16 vport_num,
 {
 	int err;
 
-	err = esw_enable_vport(esw, vport_num, enabled_events);
+	err = mlx5_esw_vport_enable(esw, vport_num, enabled_events);
 	if (err)
 		return err;
 
@@ -1443,14 +1443,14 @@ int mlx5_eswitch_load_vport(struct mlx5_eswitch *esw, u16 vport_num,
 	return err;
 
 err_rep:
-	esw_disable_vport(esw, vport_num);
+	mlx5_esw_vport_disable(esw, vport_num);
 	return err;
 }
 
 void mlx5_eswitch_unload_vport(struct mlx5_eswitch *esw, u16 vport_num)
 {
 	esw_offloads_unload_rep(esw, vport_num);
-	esw_disable_vport(esw, vport_num);
+	mlx5_esw_vport_disable(esw, vport_num);
 }
 
 void mlx5_eswitch_unload_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 4e3ed878ff03..54514b04808d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -688,6 +688,10 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
 				 enum mlx5_eswitch_vport_event enabled_events);
 void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw);
 
+int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, u16 vport_num,
+			  enum mlx5_eswitch_vport_event enabled_events);
+void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, u16 vport_num);
+
 int
 esw_vport_create_offloads_acl_tables(struct mlx5_eswitch *esw,
 				     struct mlx5_vport *vport);
@@ -706,6 +710,9 @@ esw_get_max_restore_tag(struct mlx5_eswitch *esw);
 int esw_offloads_load_rep(struct mlx5_eswitch *esw, u16 vport_num);
 void esw_offloads_unload_rep(struct mlx5_eswitch *esw, u16 vport_num);
 
+int mlx5_esw_offloads_rep_load(struct mlx5_eswitch *esw, u16 vport_num);
+void mlx5_esw_offloads_rep_unload(struct mlx5_eswitch *esw, u16 vport_num);
+
 int mlx5_eswitch_load_vport(struct mlx5_eswitch *esw, u16 vport_num,
 			    enum mlx5_eswitch_vport_event enabled_events);
 void mlx5_eswitch_unload_vport(struct mlx5_eswitch *esw, u16 vport_num);
@@ -717,6 +724,15 @@ void mlx5_eswitch_unload_vf_vports(struct mlx5_eswitch *esw, u16 num_vfs);
 int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_num);
 void mlx5_esw_offloads_devlink_port_unregister(struct mlx5_eswitch *esw, u16 vport_num);
 struct devlink_port *mlx5_esw_offloads_devlink_port(struct mlx5_eswitch *esw, u16 vport_num);
+
+int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct devlink_port *dl_port,
+				      u16 vport_num, u32 sfnum);
+void mlx5_esw_devlink_sf_port_unregister(struct mlx5_eswitch *esw, u16 vport_num);
+
+int mlx5_esw_offloads_sf_vport_enable(struct mlx5_eswitch *esw, struct devlink_port *dl_port,
+				      u16 vport_num, u32 sfnum);
+void mlx5_esw_offloads_sf_vport_disable(struct mlx5_eswitch *esw, u16 vport_num);
+
 #else  /* CONFIG_MLX5_ESWITCH */
 /* eswitch API stubs */
 static inline int  mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index f6c8dae739c2..0d76d80022ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1834,7 +1834,7 @@ static void __unload_reps_all_vport(struct mlx5_eswitch *esw, u8 rep_type)
 	__esw_offloads_unload_rep(esw, rep, rep_type);
 }
 
-static int mlx5_esw_offloads_rep_load(struct mlx5_eswitch *esw, u16 vport_num)
+int mlx5_esw_offloads_rep_load(struct mlx5_eswitch *esw, u16 vport_num)
 {
 	struct mlx5_eswitch_rep *rep;
 	int rep_type;
@@ -1858,7 +1858,7 @@ static int mlx5_esw_offloads_rep_load(struct mlx5_eswitch *esw, u16 vport_num)
 	return err;
 }
 
-static void mlx5_esw_offloads_rep_unload(struct mlx5_eswitch *esw, u16 vport_num)
+void mlx5_esw_offloads_rep_unload(struct mlx5_eswitch *esw, u16 vport_num)
 {
 	struct mlx5_eswitch_rep *rep;
 	int rep_type;
@@ -2836,3 +2836,35 @@ u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
 	return vport->metadata << (32 - ESW_SOURCE_PORT_METADATA_BITS);
 }
 EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_for_match);
+
+int mlx5_esw_offloads_sf_vport_enable(struct mlx5_eswitch *esw, struct devlink_port *dl_port,
+				      u16 vport_num, u32 sfnum)
+{
+	int err;
+
+	err = mlx5_esw_vport_enable(esw, vport_num, MLX5_VPORT_UC_ADDR_CHANGE);
+	if (err)
+		return err;
+
+	err = mlx5_esw_devlink_sf_port_register(esw, dl_port, vport_num, sfnum);
+	if (err)
+		goto devlink_err;
+
+	err = mlx5_esw_offloads_rep_load(esw, vport_num);
+	if (err)
+		goto rep_err;
+	return 0;
+
+rep_err:
+	mlx5_esw_devlink_sf_port_unregister(esw, vport_num);
+devlink_err:
+	mlx5_esw_vport_disable(esw, vport_num);
+	return err;
+}
+
+void mlx5_esw_offloads_sf_vport_disable(struct mlx5_eswitch *esw, u16 vport_num)
+{
+	mlx5_esw_offloads_rep_unload(esw, vport_num);
+	mlx5_esw_devlink_sf_port_unregister(esw, vport_num);
+	mlx5_esw_vport_disable(esw, vport_num);
+}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 10/14] net/mlx5: SF, Add port add delete functionality
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (8 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 09/14] net/mlx5: E-switch, Add eswitch helpers for " saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 11/14] net/mlx5: SF, Port function state change support saeed
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that upper layer who
wish to support sf port management in switchdev mode can perform its
task whenever eswitch mode is set to switchdev or before eswitch is
disabled.

Initialize sf port table on such eswitch event.

Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port show ens2f0npf0sf88 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v1->v2:
 - split module to devlink interface and hardware table files
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   4 +
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   5 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  25 ++
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  12 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |  18 +
 .../net/ethernet/mellanox/mlx5/core/sf/cmd.c  |  26 ++
 .../ethernet/mellanox/mlx5/core/sf/devlink.c  | 312 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/sf/hw_table.c | 125 +++++++
 .../net/ethernet/mellanox/mlx5/core/sf/priv.h |  17 +
 .../net/ethernet/mellanox/mlx5/core/sf/sf.h   |  28 ++
 include/linux/mlx5/driver.h                   |   6 +
 12 files changed, 583 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index fbbd08cb70c5..63133b6c085b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -89,3 +89,8 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/dr_domain.o steering/dr_table.o
 # SF device
 #
 mlx5_core-$(CONFIG_MLX5_SF) += sf/vhca_event.o sf/dev/dev.o sf/dev/driver.o
+
+#
+# SF manager
+#
+mlx5_core-$(CONFIG_MLX5_SF_MANAGER) += sf/cmd.o sf/hw_table.o sf/devlink.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 47dcc3ac2cf0..e8cecd50558d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -333,6 +333,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
 	case MLX5_CMD_OP_DEALLOC_MEMIC:
 	case MLX5_CMD_OP_PAGE_FAULT_RESUME:
 	case MLX5_CMD_OP_QUERY_ESW_FUNCTIONS:
+	case MLX5_CMD_OP_DEALLOC_SF:
 		return MLX5_CMD_STAT_OK;
 
 	case MLX5_CMD_OP_QUERY_HCA_CAP:
@@ -466,6 +467,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
 	case MLX5_CMD_OP_RELEASE_XRQ_ERROR:
 	case MLX5_CMD_OP_QUERY_VHCA_STATE:
 	case MLX5_CMD_OP_MODIFY_VHCA_STATE:
+	case MLX5_CMD_OP_ALLOC_SF:
 		*status = MLX5_DRIVER_STATUS_ABORTED;
 		*synd = MLX5_DRIVER_SYND;
 		return -EIO;
@@ -661,6 +663,8 @@ const char *mlx5_command_str(int command)
 	MLX5_COMMAND_STR_CASE(MODIFY_XRQ);
 	MLX5_COMMAND_STR_CASE(QUERY_VHCA_STATE);
 	MLX5_COMMAND_STR_CASE(MODIFY_VHCA_STATE);
+	MLX5_COMMAND_STR_CASE(ALLOC_SF);
+	MLX5_COMMAND_STR_CASE(DEALLOC_SF);
 	default: return "unknown command opcode";
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 9afe918c5827..d4c0cdf5edd9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -8,6 +8,7 @@
 #include "fs_core.h"
 #include "eswitch.h"
 #include "sf/dev/dev.h"
+#include "sf/sf.h"
 
 static int mlx5_devlink_flash_update(struct devlink *devlink,
 				     struct devlink_flash_update_params *params,
@@ -190,6 +191,10 @@ static const struct devlink_ops mlx5_devlink_ops = {
 	.eswitch_encap_mode_get = mlx5_devlink_eswitch_encap_mode_get,
 	.port_function_hw_addr_get = mlx5_devlink_port_function_hw_addr_get,
 	.port_function_hw_addr_set = mlx5_devlink_port_function_hw_addr_set,
+#endif
+#ifdef CONFIG_MLX5_SF_MANAGER
+	.port_new = mlx5_devlink_sf_port_new,
+	.port_del = mlx5_devlink_sf_port_del,
 #endif
 	.flash_update = mlx5_devlink_flash_update,
 	.info_get = mlx5_devlink_info_get,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index d06e7a5f15de..86e972c82af7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1600,6 +1600,15 @@ mlx5_eswitch_update_num_of_vfs(struct mlx5_eswitch *esw, int num_vfs)
 	kvfree(out);
 }
 
+static void mlx5_esw_mode_change_notify(struct mlx5_eswitch *esw, u16 mode)
+{
+	struct mlx5_esw_event_info info = {};
+
+	info.new_mode = mode;
+
+	blocking_notifier_call_chain(&esw->n_head, 0, &info);
+}
+
 /**
  * mlx5_eswitch_enable_locked - Enable eswitch
  * @esw:	Pointer to eswitch
@@ -1660,6 +1669,8 @@ int mlx5_eswitch_enable_locked(struct mlx5_eswitch *esw, int mode, int num_vfs)
 		 mode == MLX5_ESWITCH_LEGACY ? "LEGACY" : "OFFLOADS",
 		 esw->esw_funcs.num_vfs, esw->enabled_vports);
 
+	mlx5_esw_mode_change_notify(esw, mode);
+
 	return 0;
 
 abort:
@@ -1716,6 +1727,11 @@ void mlx5_eswitch_disable_locked(struct mlx5_eswitch *esw, bool clear_vf)
 		 esw->mode == MLX5_ESWITCH_LEGACY ? "LEGACY" : "OFFLOADS",
 		 esw->esw_funcs.num_vfs, esw->enabled_vports);
 
+	/* Notify eswitch users that it is exiting from current mode.
+	 * So that it can do necessary cleanup before the eswitch is disabled.
+	 */
+	mlx5_esw_mode_change_notify(esw, MLX5_ESWITCH_NONE);
+
 	mlx5_eswitch_event_handlers_unregister(esw);
 
 	if (esw->mode == MLX5_ESWITCH_LEGACY)
@@ -1816,6 +1832,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 	esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE;
 
 	dev->priv.eswitch = esw;
+	BLOCKING_INIT_NOTIFIER_HEAD(&esw->n_head);
 	return 0;
 abort:
 	if (esw->work_queue)
@@ -2507,4 +2524,12 @@ bool mlx5_esw_multipath_prereq(struct mlx5_core_dev *dev0,
 		dev1->priv.eswitch->mode == MLX5_ESWITCH_OFFLOADS);
 }
 
+int mlx5_esw_event_notifier_register(struct mlx5_eswitch *esw, struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&esw->n_head, nb);
+}
 
+void mlx5_esw_event_notifier_unregister(struct mlx5_eswitch *esw, struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&esw->n_head, nb);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 54514b04808d..479d2ac2cd85 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -278,6 +278,7 @@ struct mlx5_eswitch {
 	struct {
 		u32             large_group_num;
 	}  params;
+	struct blocking_notifier_head n_head;
 };
 
 void esw_offloads_disable(struct mlx5_eswitch *esw);
@@ -733,6 +734,17 @@ int mlx5_esw_offloads_sf_vport_enable(struct mlx5_eswitch *esw, struct devlink_p
 				      u16 vport_num, u32 sfnum);
 void mlx5_esw_offloads_sf_vport_disable(struct mlx5_eswitch *esw, u16 vport_num);
 
+/**
+ * mlx5_esw_event_info - Indicates eswitch mode changed/changing.
+ *
+ * @new_mode: New mode of eswitch.
+ */
+struct mlx5_esw_event_info {
+	u16 new_mode;
+};
+
+int mlx5_esw_event_notifier_register(struct mlx5_eswitch *esw, struct notifier_block *n);
+void mlx5_esw_event_notifier_unregister(struct mlx5_eswitch *esw, struct notifier_block *n);
 #else  /* CONFIG_MLX5_ESWITCH */
 /* eswitch API stubs */
 static inline int  mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 932a280a56a5..435323088ce0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -893,6 +893,18 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 		goto err_fpga_cleanup;
 	}
 
+	err = mlx5_sf_hw_table_init(dev);
+	if (err) {
+		mlx5_core_err(dev, "Failed to init SF HW table %d\n", err);
+		goto err_sf_hw_table_cleanup;
+	}
+
+	err = mlx5_sf_table_init(dev);
+	if (err) {
+		mlx5_core_err(dev, "Failed to init SF table %d\n", err);
+		goto err_sf_table_cleanup;
+	}
+
 	dev->dm = mlx5_dm_create(dev);
 	if (IS_ERR(dev->dm))
 		mlx5_core_warn(dev, "Failed to init device memory%d\n", err);
@@ -903,6 +915,10 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 
 	return 0;
 
+err_sf_table_cleanup:
+	mlx5_sf_hw_table_cleanup(dev);
+err_sf_hw_table_cleanup:
+	mlx5_vhca_event_cleanup(dev);
 err_fpga_cleanup:
 	mlx5_fpga_cleanup(dev);
 err_eswitch_cleanup:
@@ -936,6 +952,8 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_hv_vhca_destroy(dev->hv_vhca);
 	mlx5_fw_tracer_destroy(dev->tracer);
 	mlx5_dm_cleanup(dev);
+	mlx5_sf_table_cleanup(dev);
+	mlx5_sf_hw_table_cleanup(dev);
 	mlx5_vhca_event_cleanup(dev);
 	mlx5_fpga_cleanup(dev);
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
new file mode 100644
index 000000000000..13d1eba32d8f
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#include <linux/mlx5/driver.h>
+
+int mlx5_cmd_alloc_sf(struct mlx5_core_dev *dev, u16 function_id)
+{
+	u32 out[MLX5_ST_SZ_DW(alloc_sf_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(alloc_sf_in)] = {};
+
+	MLX5_SET(alloc_sf_in, in, opcode, MLX5_CMD_OP_ALLOC_SF);
+	MLX5_SET(alloc_sf_in, in, function_id, function_id);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+int mlx5_cmd_dealloc_sf(struct mlx5_core_dev *dev, u16 function_id)
+{
+	u32 out[MLX5_ST_SZ_DW(dealloc_sf_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(dealloc_sf_in)] = {};
+
+	MLX5_SET(dealloc_sf_in, in, opcode, MLX5_CMD_OP_DEALLOC_SF);
+	MLX5_SET(dealloc_sf_in, in, function_id, function_id);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
new file mode 100644
index 000000000000..e6de0e032015
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
@@ -0,0 +1,312 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#include <linux/mlx5/driver.h>
+#include "eswitch.h"
+#include "priv.h"
+
+struct mlx5_sf {
+	struct devlink_port dl_port;
+	unsigned int port_index;
+	u16 id;
+};
+
+struct mlx5_sf_table {
+	struct mlx5_core_dev *dev; /* To refer from notifier context. */
+	struct xarray port_indices; /* port index based lookup. */
+	refcount_t refcount;
+	struct completion disable_complete;
+	struct notifier_block esw_nb;
+};
+
+static struct mlx5_sf *
+mlx5_sf_lookup_by_index(struct mlx5_sf_table *table, unsigned int port_index)
+{
+	return xa_load(&table->port_indices, port_index);
+}
+
+static int mlx5_sf_id_insert(struct mlx5_sf_table *table, struct mlx5_sf *sf)
+{
+	return xa_insert(&table->port_indices, sf->port_index, sf, GFP_KERNEL);
+}
+
+static void mlx5_sf_id_erase(struct mlx5_sf_table *table, struct mlx5_sf *sf)
+{
+	xa_erase(&table->port_indices, sf->port_index);
+}
+
+static struct mlx5_sf *
+mlx5_sf_alloc(struct mlx5_sf_table *table, u32 sfnum, struct netlink_ext_ack *extack)
+{
+	unsigned int dl_port_index;
+	struct mlx5_sf *sf;
+	u16 hw_fn_id;
+	int id_err;
+	int err;
+
+	id_err = mlx5_sf_hw_table_sf_alloc(table->dev, sfnum);
+	if (id_err < 0) {
+		err = id_err;
+		goto id_err;
+	}
+
+	sf = kzalloc(sizeof(*sf), GFP_KERNEL);
+	if (!sf) {
+		err = -ENOMEM;
+		goto alloc_err;
+	}
+	sf->id = id_err;
+	hw_fn_id = mlx5_sf_sw_to_hw_id(table->dev, sf->id);
+	dl_port_index = mlx5_esw_vport_to_devlink_port_index(table->dev, hw_fn_id);
+	sf->port_index = dl_port_index;
+
+	err = mlx5_sf_id_insert(table, sf);
+	if (err)
+		goto insert_err;
+
+	return sf;
+
+insert_err:
+	kfree(sf);
+alloc_err:
+	mlx5_sf_hw_table_sf_free(table->dev, id_err);
+id_err:
+	if (err == -EEXIST)
+		NL_SET_ERR_MSG_MOD(extack, "SF already exist. Choose different sfnum");
+	return ERR_PTR(err);
+}
+
+static void mlx5_sf_free(struct mlx5_sf_table *table, struct mlx5_sf *sf)
+{
+	mlx5_sf_id_erase(table, sf);
+	mlx5_sf_hw_table_sf_free(table->dev, sf->id);
+	kfree(sf);
+}
+
+static struct mlx5_sf_table *mlx5_sf_table_try_get(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_table *table = dev->priv.sf_table;
+
+	if (!table)
+		return NULL;
+
+	return refcount_inc_not_zero(&table->refcount) ? table : NULL;
+}
+
+static void mlx5_sf_table_put(struct mlx5_sf_table *table)
+{
+	if (refcount_dec_and_test(&table->refcount))
+		complete(&table->disable_complete);
+}
+
+static int mlx5_sf_add(struct mlx5_core_dev *dev, struct mlx5_sf_table *table,
+		       const struct devlink_port_new_attrs *new_attr,
+		       struct netlink_ext_ack *extack)
+{
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
+	struct mlx5_sf *sf;
+	u16 hw_fn_id;
+	int err;
+
+	sf = mlx5_sf_alloc(table, new_attr->sfnum, extack);
+	if (IS_ERR(sf))
+		return PTR_ERR(sf);
+
+	hw_fn_id = mlx5_sf_sw_to_hw_id(dev, sf->id);
+	err = mlx5_esw_offloads_sf_vport_enable(esw, &sf->dl_port, hw_fn_id, new_attr->sfnum);
+	if (err)
+		goto esw_err;
+	return 0;
+
+esw_err:
+	mlx5_sf_free(table, sf);
+	return err;
+}
+
+static void mlx5_sf_del(struct mlx5_core_dev *dev, struct mlx5_sf_table *table, struct mlx5_sf *sf)
+{
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
+	u16 hw_fn_id;
+
+	hw_fn_id = mlx5_sf_sw_to_hw_id(dev, sf->id);
+	mlx5_esw_offloads_sf_vport_disable(esw, hw_fn_id);
+	mlx5_sf_free(table, sf);
+}
+
+static int
+mlx5_sf_new_check_attr(struct mlx5_core_dev *dev, const struct devlink_port_new_attrs *new_attr,
+		       struct netlink_ext_ack *extack)
+{
+	if (new_attr->flavour != DEVLINK_PORT_FLAVOUR_PCI_SF) {
+		NL_SET_ERR_MSG_MOD(extack, "Driver supports only SF port addition");
+		return -EOPNOTSUPP;
+	}
+	if (new_attr->port_index_valid) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Driver does not support user defined port index assignment");
+		return -EOPNOTSUPP;
+	}
+	if (!new_attr->sfnum_valid) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "User must provide unique sfnum. Driver does not support auto assignment");
+		return -EOPNOTSUPP;
+	}
+	if (new_attr->controller_valid && new_attr->controller) {
+		NL_SET_ERR_MSG_MOD(extack, "External controller is unsupported");
+		return -EOPNOTSUPP;
+	}
+	if (new_attr->pfnum != PCI_FUNC(dev->pdev->devfn)) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid pfnum supplied");
+		return -EOPNOTSUPP;
+	}
+	return 0;
+}
+
+int mlx5_devlink_sf_port_new(struct devlink *devlink, const struct devlink_port_new_attrs *new_attr,
+			     struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	struct mlx5_sf_table *table;
+	int err;
+
+	err = mlx5_sf_new_check_attr(dev, new_attr, extack);
+	if (err)
+		return err;
+
+	table = mlx5_sf_table_try_get(dev);
+	if (!table) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Port add is only supported in eswitch switchdev mode or SF ports are disabled.");
+		return -EOPNOTSUPP;
+	}
+	err = mlx5_sf_add(dev, table, new_attr, extack);
+	mlx5_sf_table_put(table);
+	return err;
+}
+
+int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index,
+			     struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	struct mlx5_sf_table *table;
+	struct mlx5_sf *sf;
+	int err = 0;
+
+	table = mlx5_sf_table_try_get(dev);
+	if (!table) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Port del is only supported in eswitch switchdev mode or SF ports are disabled.");
+		return -EOPNOTSUPP;
+	}
+	sf = mlx5_sf_lookup_by_index(table, port_index);
+	if (!sf) {
+		err = -ENODEV;
+		goto sf_err;
+	}
+
+	mlx5_sf_del(dev, table, sf);
+sf_err:
+	mlx5_sf_table_put(table);
+	return err;
+}
+
+static void mlx5_sf_destroy_all(struct mlx5_sf_table *table)
+{
+	struct mlx5_core_dev *dev = table->dev;
+	unsigned long index;
+	struct mlx5_sf *sf;
+
+	xa_for_each(&table->port_indices, index, sf)
+		mlx5_sf_del(dev, table, sf);
+}
+
+static void mlx5_sf_table_enable(struct mlx5_sf_table *table)
+{
+	if (!mlx5_sf_max_functions(table->dev))
+		return;
+
+	init_completion(&table->disable_complete);
+	refcount_set(&table->refcount, 1);
+}
+
+void mlx5_sf_table_disable(struct mlx5_sf_table *table)
+{
+	if (!mlx5_sf_max_functions(table->dev))
+		return;
+
+	if (!refcount_read(&table->refcount))
+		return;
+
+	/* Balances with refcount_set; drop the reference so that new user cmd cannot start. */
+	mlx5_sf_table_put(table);
+	wait_for_completion(&table->disable_complete);
+
+	/* At this point, no new user commands can start.
+	 * It is safe to destroy all user created SFs.
+	 */
+	mlx5_sf_destroy_all(table);
+}
+
+static int mlx5_sf_esw_event(struct notifier_block *nb, unsigned long event, void *data)
+{
+	struct mlx5_sf_table *table = container_of(nb, struct mlx5_sf_table, esw_nb);
+	const struct mlx5_esw_event_info *mode = data;
+
+	switch (mode->new_mode) {
+	case MLX5_ESWITCH_OFFLOADS:
+		mlx5_sf_table_enable(table);
+		break;
+	case MLX5_ESWITCH_NONE:
+		mlx5_sf_table_disable(table);
+		break;
+	default:
+		break;
+	};
+
+	return 0;
+}
+
+static bool mlx5_sf_table_supported(const struct mlx5_core_dev *dev)
+{
+	return dev->priv.eswitch && MLX5_ESWITCH_MANAGER(dev) && mlx5_sf_supported(dev);
+}
+
+int mlx5_sf_table_init(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_table *table;
+	int err;
+
+	if (!mlx5_sf_table_supported(dev))
+		return 0;
+
+	table = kzalloc(sizeof(*table), GFP_KERNEL);
+	if (!table)
+		return -ENOMEM;
+
+	table->dev = dev;
+	xa_init(&table->port_indices);
+	dev->priv.sf_table = table;
+	table->esw_nb.notifier_call = mlx5_sf_esw_event;
+	err = mlx5_esw_event_notifier_register(dev->priv.eswitch, &table->esw_nb);
+	if (err)
+		goto reg_err;
+	return 0;
+
+reg_err:
+	kfree(table);
+	dev->priv.sf_table = NULL;
+	return err;
+}
+
+void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_table *table = dev->priv.sf_table;
+
+	if (!table)
+		return;
+
+	mlx5_esw_event_notifier_unregister(dev->priv.eswitch, &table->esw_nb);
+	WARN_ON(refcount_read(&table->refcount));
+	WARN_ON(!xa_empty(&table->port_indices));
+	kfree(table);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
new file mode 100644
index 000000000000..c7757f399e8a
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+#include <linux/mlx5/driver.h>
+#include "vhca_event.h"
+#include "priv.h"
+#include "sf.h"
+#include "ecpf.h"
+
+struct mlx5_sf_hw {
+	u32 usr_sfnum;
+	u8 allocated: 1;
+};
+
+struct mlx5_sf_hw_table {
+	struct mlx5_core_dev *dev;
+	struct mlx5_sf_hw *sfs;
+	int max_local_functions;
+	u8 ecpu: 1;
+};
+
+u16 mlx5_sf_sw_to_hw_id(const struct mlx5_core_dev *dev, u16 sw_id)
+{
+	return sw_id + mlx5_sf_start_function_id(dev);
+}
+
+int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum)
+{
+	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
+	int sw_id = -ENOSPC;
+	u16 hw_fn_id;
+	int err;
+	int i;
+
+	if (!table->max_local_functions)
+		return -EOPNOTSUPP;
+
+	/* Check if sf with same sfnum already exists or not. */
+	for (i = 0; i < table->max_local_functions; i++) {
+		if (table->sfs[i].allocated && table->sfs[i].usr_sfnum == usr_sfnum)
+			return -EEXIST;
+	}
+
+	/* Find the free entry and allocate the entry from the array */
+	for (i = 0; i < table->max_local_functions; i++) {
+		if (!table->sfs[i].allocated) {
+			table->sfs[i].usr_sfnum = usr_sfnum;
+			table->sfs[i].allocated = true;
+			sw_id = i;
+			break;
+		}
+	}
+	if (sw_id == -ENOSPC) {
+		err = -ENOSPC;
+		goto err;
+	}
+
+	hw_fn_id = mlx5_sf_sw_to_hw_id(table->dev, sw_id);
+	err = mlx5_cmd_alloc_sf(table->dev, hw_fn_id);
+	if (err)
+		goto err;
+
+	err = mlx5_modify_vhca_sw_id(dev, hw_fn_id, table->ecpu, usr_sfnum);
+	if (err)
+		goto vhca_err;
+
+	return sw_id;
+
+vhca_err:
+	mlx5_cmd_dealloc_sf(table->dev, hw_fn_id);
+err:
+	table->sfs[i].allocated = false;
+	return err;
+}
+
+void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id)
+{
+	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
+	u16 hw_fn_id;
+
+	hw_fn_id = mlx5_sf_sw_to_hw_id(table->dev, id);
+	mlx5_cmd_dealloc_sf(table->dev, hw_fn_id);
+	table->sfs[id].allocated = false;
+}
+
+int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_hw_table *table;
+	struct mlx5_sf_hw *sfs;
+	int max_functions;
+
+	if (!mlx5_sf_supported(dev))
+		return 0;
+
+	max_functions = mlx5_sf_max_functions(dev);
+	table = kzalloc(sizeof(*table), GFP_KERNEL);
+	if (!table)
+		return -ENOMEM;
+
+	sfs = kcalloc(max_functions, sizeof(*sfs), GFP_KERNEL);
+	if (!sfs)
+		goto table_err;
+
+	table->dev = dev;
+	table->sfs = sfs;
+	table->max_local_functions = max_functions;
+	table->ecpu = mlx5_read_embedded_cpu(dev);
+	dev->priv.sf_hw_table = table;
+	mlx5_core_dbg(dev, "SF HW table: max sfs = %d\n", max_functions);
+	return 0;
+
+table_err:
+	kfree(table);
+	return -ENOMEM;
+}
+
+void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
+
+	if (!table)
+		return;
+
+	kfree(table->sfs);
+	kfree(table);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
new file mode 100644
index 000000000000..7f3622375a9c
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd */
+
+#ifndef __MLX5_SF_PRIV_H__
+#define __MLX5_SF_PRIV_H__
+
+#include <linux/mlx5/driver.h>
+
+int mlx5_cmd_alloc_sf(struct mlx5_core_dev *dev, u16 function_id);
+int mlx5_cmd_dealloc_sf(struct mlx5_core_dev *dev, u16 function_id);
+
+u16 mlx5_sf_sw_to_hw_id(const struct mlx5_core_dev *dev, u16 sw_id);
+
+int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum);
+void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id);
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
index 623191679b49..dd23b6c2d887 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
@@ -28,6 +28,16 @@ static inline u16 mlx5_sf_max_functions(const struct mlx5_core_dev *dev)
 		return 1 << MLX5_CAP_GEN(dev, log_max_sf);
 }
 
+int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev);
+void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev);
+
+int mlx5_sf_table_init(struct mlx5_core_dev *dev);
+void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev);
+
+int mlx5_devlink_sf_port_new(struct devlink *devlink, const struct devlink_port_new_attrs *add_attr,
+			     struct netlink_ext_ack *extack);
+int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index,
+			     struct netlink_ext_ack *extack);
 #else
 
 static inline bool mlx5_sf_supported(const struct mlx5_core_dev *dev)
@@ -40,6 +50,24 @@ static inline u16 mlx5_sf_max_functions(const struct mlx5_core_dev *dev)
 	return 0;
 }
 
+static inline int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
+{
+	return 0;
+}
+
+static inline void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev)
+{
+}
+
+static inline int mlx5_sf_table_init(struct mlx5_core_dev *dev)
+{
+	return 0;
+}
+
+static inline void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev)
+{
+}
+
 #endif
 
 #endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 48e3638b1185..7e357c7f0d5e 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -510,6 +510,8 @@ struct mlx5_eq_table;
 struct mlx5_irq_table;
 struct mlx5_vhca_state_notifier;
 struct mlx5_sf_dev_table;
+struct mlx5_sf_hw_table;
+struct mlx5_sf_table;
 
 struct mlx5_rate_limit {
 	u32			rate;
@@ -611,6 +613,10 @@ struct mlx5_priv {
 	struct mlx5_sf_dev_table *sf_dev_table;
 	struct mlx5_core_dev *parent_mdev;
 #endif
+#ifdef CONFIG_MLX5_SF_MANAGER
+	struct mlx5_sf_hw_table *sf_hw_table;
+	struct mlx5_sf_table *sf_table;
+#endif
 };
 
 enum mlx5_device_state {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 11/14] net/mlx5: SF, Port function state change support
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (9 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 10/14] net/mlx5: SF, Add port add delete functionality saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 12/14] devlink: Add devlink port documentation saeed
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Vu Pham, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Support changing the state of the SF port's function through devlink.
When activating the SF port's function, enable the hca in the device
followed by adding its auxiliary device.
When deactivating the SF port's function, delete its auxiliary device
followed by disabling the vHCA.

Port function attributes get/set callbacks are invoked with devlink
instance lock held. Such callbacks need to synchronize with sf port
table getting disabled either via sriov sysfs callback. Such callbacks
synchronize with table disable context holding table refcount.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active

$ devlink port show ens2f0npf0sf88 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:88:88",
                "state": "active",
                "opstate": "attached"
            }
        }
    }
}

On port function activation, an auxiliary device is created in below
example.

$ devlink dev show
devlink dev show auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v1->v2:
 - split module to devlink interface and hardware table
 - using vhca event from devivce for state transition
---
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   2 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |  10 +
 .../net/ethernet/mellanox/mlx5/core/sf/cmd.c  |  22 ++
 .../ethernet/mellanox/mlx5/core/sf/devlink.c  | 284 ++++++++++++++++--
 .../ethernet/mellanox/mlx5/core/sf/hw_table.c | 118 +++++++-
 .../net/ethernet/mellanox/mlx5/core/sf/priv.h |   4 +
 .../net/ethernet/mellanox/mlx5/core/sf/sf.h   |  19 ++
 7 files changed, 433 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index d4c0cdf5edd9..75d950d95fcf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -195,6 +195,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
 #ifdef CONFIG_MLX5_SF_MANAGER
 	.port_new = mlx5_devlink_sf_port_new,
 	.port_del = mlx5_devlink_sf_port_del,
+	.port_function_state_get = mlx5_devlink_sf_port_fn_state_get,
+	.port_function_state_set = mlx5_devlink_sf_port_fn_state_set,
 #endif
 	.flash_update = mlx5_devlink_flash_update,
 	.info_get = mlx5_devlink_info_get,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 435323088ce0..f6b885fdd5c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -75,6 +75,7 @@
 #include "diag/rsc_dump.h"
 #include "sf/vhca_event.h"
 #include "sf/dev/dev.h"
+#include "sf/sf.h"
 
 MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>");
 MODULE_DESCRIPTION("Mellanox 5th generation network adapters (ConnectX series) core driver");
@@ -1161,6 +1162,12 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 
 	mlx5_vhca_event_start(dev);
 
+	err = mlx5_sf_hw_table_create(dev);
+	if (err) {
+		mlx5_core_err(dev, "sf table create failed %d\n", err);
+		goto err_vhca;
+	}
+
 	err = mlx5_ec_init(dev);
 	if (err) {
 		mlx5_core_err(dev, "Failed to init embedded CPU\n");
@@ -1180,6 +1187,8 @@ static int mlx5_load(struct mlx5_core_dev *dev)
 err_sriov:
 	mlx5_ec_cleanup(dev);
 err_ec:
+	mlx5_sf_hw_table_destroy(dev);
+err_vhca:
 	mlx5_vhca_event_stop(dev);
 	mlx5_cleanup_fs(dev);
 err_fs:
@@ -1209,6 +1218,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
 	mlx5_sf_dev_table_destroy(dev);
 	mlx5_sriov_detach(dev);
 	mlx5_ec_cleanup(dev);
+	mlx5_sf_hw_table_destroy(dev);
 	mlx5_vhca_event_stop(dev);
 	mlx5_cleanup_fs(dev);
 	mlx5_accel_ipsec_cleanup(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
index 13d1eba32d8f..8dd44a2b2467 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
@@ -24,3 +24,25 @@ int mlx5_cmd_dealloc_sf(struct mlx5_core_dev *dev, u16 function_id)
 
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
+
+int mlx5_cmd_sf_enable_hca(struct mlx5_core_dev *dev, u16 func_id)
+{
+	u32 out[MLX5_ST_SZ_DW(enable_hca_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(enable_hca_in)] = {};
+
+	MLX5_SET(enable_hca_in, in, opcode, MLX5_CMD_OP_ENABLE_HCA);
+	MLX5_SET(enable_hca_in, in, function_id, func_id);
+	MLX5_SET(enable_hca_in, in, embedded_cpu_function, 0);
+	return mlx5_cmd_exec(dev, &in, sizeof(in), &out, sizeof(out));
+}
+
+int mlx5_cmd_sf_disable_hca(struct mlx5_core_dev *dev, u16 func_id)
+{
+	u32 out[MLX5_ST_SZ_DW(disable_hca_out)] = {};
+	u32 in[MLX5_ST_SZ_DW(disable_hca_in)] = {};
+
+	MLX5_SET(disable_hca_in, in, opcode, MLX5_CMD_OP_DISABLE_HCA);
+	MLX5_SET(disable_hca_in, in, function_id, func_id);
+	MLX5_SET(enable_hca_in, in, embedded_cpu_function, 0);
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
index e6de0e032015..ef81a4e91327 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
@@ -4,11 +4,17 @@
 #include <linux/mlx5/driver.h>
 #include "eswitch.h"
 #include "priv.h"
+#include "sf/dev/dev.h"
+#include "mlx5_ifc_vhca_event.h"
+#include "vhca_event.h"
+#include "ecpf.h"
 
 struct mlx5_sf {
 	struct devlink_port dl_port;
 	unsigned int port_index;
 	u16 id;
+	u16 hw_fn_id;
+	u16 hw_state;
 };
 
 struct mlx5_sf_table {
@@ -16,7 +22,10 @@ struct mlx5_sf_table {
 	struct xarray port_indices; /* port index based lookup. */
 	refcount_t refcount;
 	struct completion disable_complete;
+	struct mutex sf_state_lock; /* Serializes sf state among user cmds & vhca event handler. */
 	struct notifier_block esw_nb;
+	struct notifier_block vhca_nb;
+	u8 ecpu: 1;
 };
 
 static struct mlx5_sf *
@@ -25,6 +34,19 @@ mlx5_sf_lookup_by_index(struct mlx5_sf_table *table, unsigned int port_index)
 	return xa_load(&table->port_indices, port_index);
 }
 
+static struct mlx5_sf *
+mlx5_sf_lookup_by_function_id(struct mlx5_sf_table *table, unsigned int fn_id)
+{
+	unsigned long index;
+	struct mlx5_sf *sf;
+
+	xa_for_each(&table->port_indices, index, sf) {
+		if (sf->hw_fn_id == fn_id)
+			return sf;
+	}
+	return NULL;
+}
+
 static int mlx5_sf_id_insert(struct mlx5_sf_table *table, struct mlx5_sf *sf)
 {
 	return xa_insert(&table->port_indices, sf->port_index, sf, GFP_KERNEL);
@@ -59,6 +81,8 @@ mlx5_sf_alloc(struct mlx5_sf_table *table, u32 sfnum, struct netlink_ext_ack *ex
 	hw_fn_id = mlx5_sf_sw_to_hw_id(table->dev, sf->id);
 	dl_port_index = mlx5_esw_vport_to_devlink_port_index(table->dev, hw_fn_id);
 	sf->port_index = dl_port_index;
+	sf->hw_fn_id = hw_fn_id;
+	sf->hw_state = MLX5_VHCA_STATE_ALLOCATED;
 
 	err = mlx5_sf_id_insert(table, sf);
 	if (err)
@@ -99,6 +123,146 @@ static void mlx5_sf_table_put(struct mlx5_sf_table *table)
 		complete(&table->disable_complete);
 }
 
+static enum devlink_port_function_state mlx5_sf_to_devlink_state(u8 hw_state)
+{
+	switch (hw_state) {
+	case MLX5_VHCA_STATE_ACTIVE:
+	case MLX5_VHCA_STATE_IN_USE:
+	case MLX5_VHCA_STATE_TEARDOWN_REQUEST:
+		return DEVLINK_PORT_FUNCTION_STATE_ACTIVE;
+	case MLX5_VHCA_STATE_INVALID:
+	case MLX5_VHCA_STATE_ALLOCATED:
+	default:
+		return DEVLINK_PORT_FUNCTION_STATE_INACTIVE;
+	}
+}
+
+static enum devlink_port_function_opstate mlx5_sf_to_devlink_opstate(u8 hw_state)
+{
+	switch (hw_state) {
+	case MLX5_VHCA_STATE_IN_USE:
+	case MLX5_VHCA_STATE_TEARDOWN_REQUEST:
+		return DEVLINK_PORT_FUNCTION_OPSTATE_ATTACHED;
+	case MLX5_VHCA_STATE_INVALID:
+	case MLX5_VHCA_STATE_ALLOCATED:
+	case MLX5_VHCA_STATE_ACTIVE:
+	default:
+		return DEVLINK_PORT_FUNCTION_OPSTATE_DETACHED;
+	}
+}
+
+static bool mlx5_sf_is_active(const struct mlx5_sf *sf)
+{
+	return sf->hw_state == MLX5_VHCA_STATE_ACTIVE || sf->hw_state == MLX5_VHCA_STATE_IN_USE;
+}
+
+int mlx5_devlink_sf_port_fn_state_get(struct devlink *devlink, struct devlink_port *dl_port,
+				      enum devlink_port_function_state *state,
+				      enum devlink_port_function_opstate *opstate,
+				      struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	struct mlx5_sf_table *table;
+	struct mlx5_sf *sf;
+	int err = 0;
+
+	table = mlx5_sf_table_try_get(dev);
+	if (!table)
+		return -EOPNOTSUPP;
+
+	sf = mlx5_sf_lookup_by_index(table, dl_port->index);
+	if (!sf) {
+		err = -EOPNOTSUPP;
+		goto sf_err;
+	}
+	mutex_lock(&table->sf_state_lock);
+	*state = mlx5_sf_to_devlink_state(sf->hw_state);
+	*opstate = mlx5_sf_to_devlink_opstate(sf->hw_state);
+	mutex_unlock(&table->sf_state_lock);
+sf_err:
+	mlx5_sf_table_put(table);
+	return err;
+}
+
+static int mlx5_sf_activate(struct mlx5_core_dev *dev, struct mlx5_sf *sf)
+{
+	int err;
+
+	if (mlx5_sf_is_active(sf))
+		return 0;
+	if (sf->hw_state != MLX5_VHCA_STATE_ALLOCATED)
+		return -EINVAL;
+
+	err = mlx5_cmd_sf_enable_hca(dev, sf->hw_fn_id);
+	if (err)
+		return err;
+
+	sf->hw_state = MLX5_VHCA_STATE_ACTIVE;
+	return 0;
+}
+
+static int mlx5_sf_deactivate(struct mlx5_core_dev *dev, struct mlx5_sf *sf)
+{
+	int err;
+
+	if (!mlx5_sf_is_active(sf))
+		return 0;
+
+	err = mlx5_cmd_sf_disable_hca(dev, sf->hw_fn_id);
+	if (err)
+		return err;
+
+	sf->hw_state = MLX5_VHCA_STATE_TEARDOWN_REQUEST;
+	return 0;
+}
+
+static int mlx5_sf_state_set(struct mlx5_core_dev *dev, struct mlx5_sf_table *table,
+			     struct mlx5_sf *sf,
+			     enum devlink_port_function_state state)
+{
+	int err = 0;
+
+	mutex_lock(&table->sf_state_lock);
+	if (state == mlx5_sf_to_devlink_state(sf->hw_state))
+		goto out;
+	if (state == DEVLINK_PORT_FUNCTION_STATE_ACTIVE)
+		err = mlx5_sf_activate(dev, sf);
+	else if (state == DEVLINK_PORT_FUNCTION_STATE_INACTIVE)
+		err = mlx5_sf_deactivate(dev, sf);
+	else
+		err = -EINVAL;
+out:
+	mutex_unlock(&table->sf_state_lock);
+	return err;
+}
+
+int mlx5_devlink_sf_port_fn_state_set(struct devlink *devlink, struct devlink_port *dl_port,
+				      enum devlink_port_function_state state,
+				      struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	struct mlx5_sf_table *table;
+	struct mlx5_sf *sf;
+	int err;
+
+	table = mlx5_sf_table_try_get(dev);
+	if (!table) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Port state set is only supported in eswitch switchdev mode or SF ports are disabled.");
+		return -EOPNOTSUPP;
+	}
+	sf = mlx5_sf_lookup_by_index(table, dl_port->index);
+	if (!sf) {
+		err = -ENODEV;
+		goto out;
+	}
+
+	err = mlx5_sf_state_set(dev, table, sf, state);
+out:
+	mlx5_sf_table_put(table);
+	return err;
+}
+
 static int mlx5_sf_add(struct mlx5_core_dev *dev, struct mlx5_sf_table *table,
 		       const struct devlink_port_new_attrs *new_attr,
 		       struct netlink_ext_ack *extack)
@@ -123,16 +287,6 @@ static int mlx5_sf_add(struct mlx5_core_dev *dev, struct mlx5_sf_table *table,
 	return err;
 }
 
-static void mlx5_sf_del(struct mlx5_core_dev *dev, struct mlx5_sf_table *table, struct mlx5_sf *sf)
-{
-	struct mlx5_eswitch *esw = dev->priv.eswitch;
-	u16 hw_fn_id;
-
-	hw_fn_id = mlx5_sf_sw_to_hw_id(dev, sf->id);
-	mlx5_esw_offloads_sf_vport_disable(esw, hw_fn_id);
-	mlx5_sf_free(table, sf);
-}
-
 static int
 mlx5_sf_new_check_attr(struct mlx5_core_dev *dev, const struct devlink_port_new_attrs *new_attr,
 		       struct netlink_ext_ack *extack)
@@ -184,10 +338,30 @@ int mlx5_devlink_sf_port_new(struct devlink *devlink, const struct devlink_port_
 	return err;
 }
 
+static void mlx5_sf_dealloc(struct mlx5_sf_table *table, struct mlx5_sf *sf)
+{
+	if (sf->hw_state == MLX5_VHCA_STATE_ALLOCATED) {
+		mlx5_sf_free(table, sf);
+	} else if (mlx5_sf_is_active(sf)) {
+		/* Even if its active, it is treated as in_use because by the time,
+		 * it is disabled here, it may getting used. So it is safe to
+		 * always look for the event to ensure that it is recycled only after
+		 * firmware gives confirmation that it is detached by the driver.
+		 */
+		mlx5_cmd_sf_disable_hca(table->dev, sf->hw_fn_id);
+		mlx5_sf_hw_table_sf_deferred_free(table->dev, sf->id);
+		kfree(sf);
+	} else {
+		mlx5_sf_hw_table_sf_deferred_free(table->dev, sf->id);
+		kfree(sf);
+	}
+}
+
 int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index,
 			     struct netlink_ext_ack *extack)
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
 	struct mlx5_sf_table *table;
 	struct mlx5_sf *sf;
 	int err = 0;
@@ -204,20 +378,58 @@ int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index,
 		goto sf_err;
 	}
 
-	mlx5_sf_del(dev, table, sf);
+	mlx5_esw_offloads_sf_vport_disable(esw, sf->hw_fn_id);
+	mlx5_sf_id_erase(table, sf);
+
+	mutex_lock(&table->sf_state_lock);
+	mlx5_sf_dealloc(table, sf);
+	mutex_unlock(&table->sf_state_lock);
 sf_err:
 	mlx5_sf_table_put(table);
 	return err;
 }
 
-static void mlx5_sf_destroy_all(struct mlx5_sf_table *table)
+static bool mlx5_sf_state_update_check(const struct mlx5_sf *sf, u8 new_state)
 {
-	struct mlx5_core_dev *dev = table->dev;
-	unsigned long index;
+	if (sf->hw_state == MLX5_VHCA_STATE_ACTIVE && new_state == MLX5_VHCA_STATE_IN_USE)
+		return true;
+
+	if (sf->hw_state == MLX5_VHCA_STATE_IN_USE && new_state == MLX5_VHCA_STATE_ACTIVE)
+		return true;
+
+	if (sf->hw_state == MLX5_VHCA_STATE_TEARDOWN_REQUEST &&
+	    new_state == MLX5_VHCA_STATE_ALLOCATED)
+		return true;
+
+	return false;
+}
+
+static int mlx5_sf_vhca_event(struct notifier_block *nb, unsigned long opcode, void *data)
+{
+	struct mlx5_sf_table *table = container_of(nb, struct mlx5_sf_table, vhca_nb);
+	const struct mlx5_vhca_state_event *event = data;
+	bool update = false;
 	struct mlx5_sf *sf;
 
-	xa_for_each(&table->port_indices, index, sf)
-		mlx5_sf_del(dev, table, sf);
+	table = mlx5_sf_table_try_get(table->dev);
+	if (!table)
+		return 0;
+
+	mutex_lock(&table->sf_state_lock);
+	sf = mlx5_sf_lookup_by_function_id(table, event->function_id);
+	if (!sf)
+		goto sf_err;
+
+	/* When driver is attached or detached to a function, an event
+	 * notifies such state change.
+	 */
+	update = mlx5_sf_state_update_check(sf, event->new_vhca_state);
+	if (update)
+		sf->hw_state = event->new_vhca_state;
+sf_err:
+	mutex_unlock(&table->sf_state_lock);
+	mlx5_sf_table_put(table);
+	return 0;
 }
 
 static void mlx5_sf_table_enable(struct mlx5_sf_table *table)
@@ -229,6 +441,22 @@ static void mlx5_sf_table_enable(struct mlx5_sf_table *table)
 	refcount_set(&table->refcount, 1);
 }
 
+static void mlx5_sf_deactivate_all(struct mlx5_sf_table *table)
+{
+	struct mlx5_eswitch *esw = table->dev->priv.eswitch;
+	unsigned long index;
+	struct mlx5_sf *sf;
+
+	/* At this point, no new user commands can start and no vhca event can
+	 * arrive. It is safe to destroy all user created SFs.
+	 */
+	xa_for_each(&table->port_indices, index, sf) {
+		mlx5_esw_offloads_sf_vport_disable(esw, sf->hw_fn_id);
+		mlx5_sf_id_erase(table, sf);
+		mlx5_sf_dealloc(table, sf);
+	}
+}
+
 void mlx5_sf_table_disable(struct mlx5_sf_table *table)
 {
 	if (!mlx5_sf_max_functions(table->dev))
@@ -237,14 +465,13 @@ void mlx5_sf_table_disable(struct mlx5_sf_table *table)
 	if (!refcount_read(&table->refcount))
 		return;
 
-	/* Balances with refcount_set; drop the reference so that new user cmd cannot start. */
+	/* Balances with refcount_set; drop the reference so that new user cmd cannot start
+	 * and new vhca event handler cannnot run.
+	 */
 	mlx5_sf_table_put(table);
 	wait_for_completion(&table->disable_complete);
 
-	/* At this point, no new user commands can start.
-	 * It is safe to destroy all user created SFs.
-	 */
-	mlx5_sf_destroy_all(table);
+	mlx5_sf_deactivate_all(table);
 }
 
 static int mlx5_sf_esw_event(struct notifier_block *nb, unsigned long event, void *data)
@@ -276,23 +503,34 @@ int mlx5_sf_table_init(struct mlx5_core_dev *dev)
 	struct mlx5_sf_table *table;
 	int err;
 
-	if (!mlx5_sf_table_supported(dev))
+	if (!mlx5_sf_table_supported(dev) || !mlx5_vhca_event_supported(dev))
 		return 0;
 
 	table = kzalloc(sizeof(*table), GFP_KERNEL);
 	if (!table)
 		return -ENOMEM;
 
+	mutex_init(&table->sf_state_lock);
 	table->dev = dev;
 	xa_init(&table->port_indices);
 	dev->priv.sf_table = table;
+	refcount_set(&table->refcount, 0);
 	table->esw_nb.notifier_call = mlx5_sf_esw_event;
 	err = mlx5_esw_event_notifier_register(dev->priv.eswitch, &table->esw_nb);
 	if (err)
 		goto reg_err;
+
+	table->vhca_nb.notifier_call = mlx5_sf_vhca_event;
+	err = mlx5_vhca_event_notifier_register(table->dev, &table->vhca_nb);
+	if (err)
+		goto vhca_err;
+
 	return 0;
 
+vhca_err:
+	mlx5_esw_event_notifier_unregister(dev->priv.eswitch, &table->esw_nb);
 reg_err:
+	mutex_destroy(&table->sf_state_lock);
 	kfree(table);
 	dev->priv.sf_table = NULL;
 	return err;
@@ -305,8 +543,10 @@ void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev)
 	if (!table)
 		return;
 
+	mlx5_vhca_event_notifier_unregister(table->dev, &table->vhca_nb);
 	mlx5_esw_event_notifier_unregister(dev->priv.eswitch, &table->esw_nb);
 	WARN_ON(refcount_read(&table->refcount));
+	mutex_destroy(&table->sf_state_lock);
 	WARN_ON(!xa_empty(&table->port_indices));
 	kfree(table);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
index c7757f399e8a..6665949a2792 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
@@ -4,11 +4,14 @@
 #include "vhca_event.h"
 #include "priv.h"
 #include "sf.h"
+#include "mlx5_ifc_vhca_event.h"
+#include "vhca_event.h"
 #include "ecpf.h"
 
 struct mlx5_sf_hw {
 	u32 usr_sfnum;
 	u8 allocated: 1;
+	u8 pending_delete: 1;
 };
 
 struct mlx5_sf_hw_table {
@@ -16,6 +19,8 @@ struct mlx5_sf_hw_table {
 	struct mlx5_sf_hw *sfs;
 	int max_local_functions;
 	u8 ecpu: 1;
+	struct mutex table_lock; /* Serializes sf deletion and vhca state change handler. */
+	struct notifier_block vhca_nb;
 };
 
 u16 mlx5_sf_sw_to_hw_id(const struct mlx5_core_dev *dev, u16 sw_id)
@@ -23,6 +28,11 @@ u16 mlx5_sf_sw_to_hw_id(const struct mlx5_core_dev *dev, u16 sw_id)
 	return sw_id + mlx5_sf_start_function_id(dev);
 }
 
+static u16 mlx5_sf_hw_to_sw_id(const struct mlx5_core_dev *dev, u16 hw_id)
+{
+	return hw_id - mlx5_sf_start_function_id(dev);
+}
+
 int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum)
 {
 	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
@@ -34,10 +44,13 @@ int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum)
 	if (!table->max_local_functions)
 		return -EOPNOTSUPP;
 
+	mutex_lock(&table->table_lock);
 	/* Check if sf with same sfnum already exists or not. */
 	for (i = 0; i < table->max_local_functions; i++) {
-		if (table->sfs[i].allocated && table->sfs[i].usr_sfnum == usr_sfnum)
-			return -EEXIST;
+		if (table->sfs[i].allocated && table->sfs[i].usr_sfnum == usr_sfnum) {
+			err = -EEXIST;
+			goto exist_err;
+		}
 	}
 
 	/* Find the free entry and allocate the entry from the array */
@@ -63,16 +76,19 @@ int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum)
 	if (err)
 		goto vhca_err;
 
+	mutex_unlock(&table->table_lock);
 	return sw_id;
 
 vhca_err:
 	mlx5_cmd_dealloc_sf(table->dev, hw_fn_id);
 err:
 	table->sfs[i].allocated = false;
+exist_err:
+	mutex_unlock(&table->table_lock);
 	return err;
 }
 
-void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id)
+void _mlx5_sf_hw_id_free(struct mlx5_core_dev *dev, u16 id)
 {
 	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
 	u16 hw_fn_id;
@@ -80,6 +96,52 @@ void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id)
 	hw_fn_id = mlx5_sf_sw_to_hw_id(table->dev, id);
 	mlx5_cmd_dealloc_sf(table->dev, hw_fn_id);
 	table->sfs[id].allocated = false;
+	table->sfs[id].pending_delete = false;
+}
+
+void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id)
+{
+	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
+	u16 hw_fn_id;
+
+	hw_fn_id = mlx5_sf_sw_to_hw_id(table->dev, id);
+	mutex_lock(&table->table_lock);
+	_mlx5_sf_hw_id_free(dev, id);
+	mutex_unlock(&table->table_lock);
+}
+
+void mlx5_sf_hw_table_sf_deferred_free(struct mlx5_core_dev *dev, u16 id)
+{
+	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
+	u32 out[MLX5_ST_SZ_DW(query_vhca_state_out)] = {};
+	u16 hw_fn_id;
+	u8 state;
+	int err;
+
+	hw_fn_id = mlx5_sf_sw_to_hw_id(dev, id);
+	mutex_lock(&table->table_lock);
+	err = mlx5_cmd_query_vhca_state(dev, hw_fn_id, table->ecpu, out, sizeof(out));
+	if (err)
+		goto err;
+	state = MLX5_GET(query_vhca_state_out, out, vhca_state_context.vhca_state);
+	if (state == MLX5_VHCA_STATE_ALLOCATED) {
+		mlx5_cmd_dealloc_sf(table->dev, hw_fn_id);
+		table->sfs[id].allocated = false;
+	} else {
+		table->sfs[id].pending_delete = true;
+	}
+err:
+	mutex_unlock(&table->table_lock);
+}
+
+static void mlx5_sf_hw_dealloc_all(struct mlx5_sf_hw_table *table)
+{
+	int i;
+
+	for (i = 0; i < table->max_local_functions; i++) {
+		if (table->sfs[i].allocated)
+			_mlx5_sf_hw_id_free(table->dev, i);
+	}
 }
 
 int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
@@ -88,7 +150,7 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
 	struct mlx5_sf_hw *sfs;
 	int max_functions;
 
-	if (!mlx5_sf_supported(dev))
+	if (!mlx5_sf_supported(dev) || !mlx5_vhca_event_supported(dev))
 		return 0;
 
 	max_functions = mlx5_sf_max_functions(dev);
@@ -100,6 +162,7 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
 	if (!sfs)
 		goto table_err;
 
+	mutex_init(&table->table_lock);
 	table->dev = dev;
 	table->sfs = sfs;
 	table->max_local_functions = max_functions;
@@ -120,6 +183,53 @@ void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev)
 	if (!table)
 		return;
 
+	mutex_destroy(&table->table_lock);
 	kfree(table->sfs);
 	kfree(table);
 }
+
+static int mlx5_sf_hw_vhca_event(struct notifier_block *nb, unsigned long opcode, void *data)
+{
+	struct mlx5_sf_hw_table *table = container_of(nb, struct mlx5_sf_hw_table, vhca_nb);
+	const struct mlx5_vhca_state_event *event = data;
+	struct mlx5_sf_hw *sf_hw;
+	u16 sw_id;
+
+	if (event->new_vhca_state != MLX5_VHCA_STATE_ALLOCATED)
+		return 0;
+
+	sw_id = mlx5_sf_hw_to_sw_id(table->dev, event->function_id);
+	sf_hw = &table->sfs[sw_id];
+
+	mutex_lock(&table->table_lock);
+	/* SF driver notified through firmware that SF is finally detached.
+	 * Hence recycle the sf hardware id for reuse.
+	 */
+	if (sf_hw->allocated && sf_hw->pending_delete)
+		_mlx5_sf_hw_id_free(table->dev, sw_id);
+	mutex_unlock(&table->table_lock);
+	return 0;
+}
+
+int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
+
+	if (!table)
+		return 0;
+
+	table->vhca_nb.notifier_call = mlx5_sf_hw_vhca_event;
+	return mlx5_vhca_event_notifier_register(table->dev, &table->vhca_nb);
+}
+
+void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev)
+{
+	struct mlx5_sf_hw_table *table = dev->priv.sf_hw_table;
+
+	if (!table)
+		return;
+
+	mlx5_vhca_event_notifier_unregister(table->dev, &table->vhca_nb);
+	/* Dealloc SFs whose firmware event has been missed. */
+	mlx5_sf_hw_dealloc_all(table);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
index 7f3622375a9c..cb02a51d0986 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
@@ -9,9 +9,13 @@
 int mlx5_cmd_alloc_sf(struct mlx5_core_dev *dev, u16 function_id);
 int mlx5_cmd_dealloc_sf(struct mlx5_core_dev *dev, u16 function_id);
 
+int mlx5_cmd_sf_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
+int mlx5_cmd_sf_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
+
 u16 mlx5_sf_sw_to_hw_id(const struct mlx5_core_dev *dev, u16 sw_id);
 
 int mlx5_sf_hw_table_sf_alloc(struct mlx5_core_dev *dev, u32 usr_sfnum);
 void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u16 id);
+void mlx5_sf_hw_table_sf_deferred_free(struct mlx5_core_dev *dev, u16 id);
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
index dd23b6c2d887..296fd070617e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
@@ -31,6 +31,9 @@ static inline u16 mlx5_sf_max_functions(const struct mlx5_core_dev *dev)
 int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev);
 void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev);
 
+int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev);
+void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev);
+
 int mlx5_sf_table_init(struct mlx5_core_dev *dev);
 void mlx5_sf_table_cleanup(struct mlx5_core_dev *dev);
 
@@ -38,6 +41,13 @@ int mlx5_devlink_sf_port_new(struct devlink *devlink, const struct devlink_port_
 			     struct netlink_ext_ack *extack);
 int mlx5_devlink_sf_port_del(struct devlink *devlink, unsigned int port_index,
 			     struct netlink_ext_ack *extack);
+int mlx5_devlink_sf_port_fn_state_get(struct devlink *devlink, struct devlink_port *dl_port,
+				      enum devlink_port_function_state *state,
+				      enum devlink_port_function_opstate *opstate,
+				      struct netlink_ext_ack *extack);
+int mlx5_devlink_sf_port_fn_state_set(struct devlink *devlink, struct devlink_port *dl_port,
+				      enum devlink_port_function_state state,
+				      struct netlink_ext_ack *extack);
 #else
 
 static inline bool mlx5_sf_supported(const struct mlx5_core_dev *dev)
@@ -59,6 +69,15 @@ static inline void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev)
 {
 }
 
+static inline int mlx5_sf_hw_table_create(struct mlx5_core_dev *dev)
+{
+	return 0;
+}
+
+static inline void mlx5_sf_hw_table_destroy(struct mlx5_core_dev *dev)
+{
+}
+
 static inline int mlx5_sf_table_init(struct mlx5_core_dev *dev)
 {
 	return 0;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 12/14] devlink: Add devlink port documentation
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (10 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 11/14] net/mlx5: SF, Port function state change support saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 13/14] devlink: Extend devlink port documentation for subfunctions saeed
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Jiri Pirko, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Added documentation for devlink port and port function related commands.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v5->v6:
 - added label for devlink port to refer in other documentation
 - rephrased port flavour description
 - added article a before "PCI device"
 - changed 'root complex' to 'root complexes'
 - changed 'may support' to 'support'
 - rephrased diagram description line
 - removed networking from port flavour description
v4->v5:
 - described logically ingress and egress point of devlink port
 - removed networking from devlink port description
 - rephrased port type description
 - introduce PCI controller section and description
 - rephrased controller, device, function description
 - removed confusing eswitch to system wording
 - rephrased port function description
 - added example of mac address in port function attribute description
v3->v4:
 - changed 'exist' to 'exists'
 - added 'an' eswitch
 - changed 'can have one' to 'consists of'
 - changed 'who intents' to 'that intends'
 - removed unnecessary comma
 - rewrote description for the example diagram
 - changed 'controller consist of' to 'controller consists of'
v2->v3:
 - rephrased many lines
 - first paragraph now describe devlink port
 - instead of saying PCI device/function, using PCI function every
   where
 - changed 'physical link layer' to 'link layer'
 - made devlink port type description more clear
 - made devlink port flavour description more clear
 - moved devlink port type table after port flavour
 - added description for the example diagram
 - describe CPU port that its linked to DSA
 - made devlink port description for eswitch port more clear
v1->v2:
 - Removed duplicate table entries for DEVLINK_PORT_FLAVOUR_VIRTUAL.
 - replaced 'consist of' to 'consisting'
 - changed 'can be' to 'can be of'
---
 .../networking/devlink/devlink-port.rst       | 118 ++++++++++++++++++
 Documentation/networking/devlink/index.rst    |   1 +
 2 files changed, 119 insertions(+)
 create mode 100644 Documentation/networking/devlink/devlink-port.rst

diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
new file mode 100644
index 000000000000..4c910dbb01ca
--- /dev/null
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -0,0 +1,118 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _devlink_port:
+
+============
+Devlink Port
+============
+
+``devlink-port`` is a port that exists on the device. It has a logically
+separate ingress/egress point of the device. A devlink port can be any one
+of many flavours. A devlink port flavour along with port attributes
+describe what a port represents.
+
+A device driver that intends to publish a devlink port sets the
+devlink port attributes and registers the devlink port.
+
+Devlink port flavours are described below.
+
+.. list-table:: List of devlink port flavours
+   :widths: 33 90
+
+   * - Flavour
+     - Description
+   * - ``DEVLINK_PORT_FLAVOUR_PHYSICAL``
+     - Any kind of physical port. This can be an eswitch physical port or any
+       other physical port on the device.
+   * - ``DEVLINK_PORT_FLAVOUR_DSA``
+     - This indicates a DSA interconnect port.
+   * - ``DEVLINK_PORT_FLAVOUR_CPU``
+     - This indicates a CPU port applicable only to DSA.
+   * - ``DEVLINK_PORT_FLAVOUR_PCI_PF``
+     - This indicates an eswitch port representing a port of PCI
+       physical function (PF).
+   * - ``DEVLINK_PORT_FLAVOUR_PCI_VF``
+     - This indicates an eswitch port representing a port of PCI
+       virtual function (VF).
+   * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
+     - This indicates a virtual port for the PCI virtual function.
+
+Devlink port can have a different type based on the link layer described below.
+
+.. list-table:: List of devlink port types
+   :widths: 23 90
+
+   * - Type
+     - Description
+   * - ``DEVLINK_PORT_TYPE_ETH``
+     - Driver should set this port type when a link layer of the port is
+       Ethernet.
+   * - ``DEVLINK_PORT_TYPE_IB``
+     - Driver should set this port type when a link layer of the port is
+       InfiniBand.
+   * - ``DEVLINK_PORT_TYPE_AUTO``
+     - This type is indicated by the user when driver should detect the port
+       type automatically.
+
+PCI controllers
+---------------
+In most cases a PCI device has only one controller. A controller consists of
+potentially multiple physical and virtual functions. Such PCI function consists
+of one or more ports. This port of the function is represented by the devlink
+eswitch port.
+
+A PCI Device connected to multiple CPUs or multiple PCI root complexes or
+SmartNIC, however, may have multiple controllers. For a device with multiple
+controllers, each controller is distinguished by a unique controller number.
+An eswitch on the PCI device support ports of multiple controllers.
+
+An example view of a system with two controllers::
+
+                 ---------------------------------------------------------
+                 |                                                       |
+                 |           --------- ---------         ------- ------- |
+    -----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
+    | server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
+    | pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
+    | connect |  | -------                       -------                 |
+    -----------  |     | controller_num=1 (no eswitch)                   |
+                 ------|--------------------------------------------------
+                 (internal wire)
+                       |
+                 ---------------------------------------------------------
+                 | devlink eswitch ports and reps                        |
+                 | ----------------------------------------------------- |
+                 | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
+                 | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
+                 | ----------------------------------------------------- |
+                 | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
+                 | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
+                 | ----------------------------------------------------- |
+                 |                                                       |
+                 |                                                       |
+    -----------  |           --------- ---------         ------- ------- |
+    | smartNIC|  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
+    | pci rc  |==| -------   ----/---- ---/----- ------- ---/--- ---/--- |
+    | connect |  | | pf0 |______/________/       | pf1 |___/_______/     |
+    -----------  | -------                       -------                 |
+                 |                                                       |
+                 |  local controller_num=0 (eswitch)                     |
+                 ---------------------------------------------------------
+
+In above example, external controller (identified by controller number = 1)
+doesn't have eswitch. Local controller (identified by controller number = 0)
+has the eswitch. Devlink instance on local controller has eswitch devlink
+ports representing ports for both the controllers.
+
+Port function configuration
+===========================
+
+A user can configure the port function attribute before enumerating the
+PCI function. Usually it means, user should configure port function attribute
+before a bus specific device for the function is created. However, when
+SRIOV is enabled, virtual function devices are created on the PCI bus.
+Hence, function attribute should be configured before binding virtual
+function device to the driver.
+
+User may set the hardware address of the function represented by the devlink
+port function. For Ethernet port function this means a MAC address.
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index d82874760ae2..aab79667f97b 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -18,6 +18,7 @@ general.
    devlink-info
    devlink-flash
    devlink-params
+   devlink-port
    devlink-region
    devlink-resource
    devlink-reload
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 13/14] devlink: Extend devlink port documentation for subfunctions
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (11 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 12/14] devlink: Add devlink port documentation saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-09  7:29 ` [PATCH net-next v2 14/14] net/mlx5: Add devlink subfunction port documentation saeed
  2020-12-11  4:11 ` [PATCH net-next v2 00/14] Add mlx5 subfunction support Samudrala, Sridhar
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Add devlink port documentation for subfunction management.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v1->v2:
 - new patch
---
 Documentation/driver-api/auxiliary_bus.rst    |  2 +
 .../networking/devlink/devlink-port.rst       | 89 ++++++++++++++++++-
 2 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/Documentation/driver-api/auxiliary_bus.rst b/Documentation/driver-api/auxiliary_bus.rst
index 2312506b0674..fff96c7ba7a8 100644
--- a/Documentation/driver-api/auxiliary_bus.rst
+++ b/Documentation/driver-api/auxiliary_bus.rst
@@ -1,5 +1,7 @@
 .. SPDX-License-Identifier: GPL-2.0-only
 
+.. _auxiliary_bus:
+
 =============
 Auxiliary Bus
 =============
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 4c910dbb01ca..c6924e7a341e 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -34,6 +34,9 @@ Devlink port flavours are described below.
    * - ``DEVLINK_PORT_FLAVOUR_PCI_VF``
      - This indicates an eswitch port representing a port of PCI
        virtual function (VF).
+   * - ``DEVLINK_PORT_FLAVOUR_PCI_SF``
+     - This indicates an eswitch port representing a port of PCI
+       subfunction (SF).
    * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL``
      - This indicates a virtual port for the PCI virtual function.
 
@@ -57,9 +60,9 @@ Devlink port can have a different type based on the link layer described below.
 PCI controllers
 ---------------
 In most cases a PCI device has only one controller. A controller consists of
-potentially multiple physical and virtual functions. Such PCI function consists
-of one or more ports. This port of the function is represented by the devlink
-eswitch port.
+potentially multiple physical functions, virtual functions and subfunctions.
+Such PCI function consists of one or more ports. This port of the function
+is represented by the devlink eswitch port.
 
 A PCI Device connected to multiple CPUs or multiple PCI root complexes or
 SmartNIC, however, may have multiple controllers. For a device with multiple
@@ -112,7 +115,85 @@ PCI function. Usually it means, user should configure port function attribute
 before a bus specific device for the function is created. However, when
 SRIOV is enabled, virtual function devices are created on the PCI bus.
 Hence, function attribute should be configured before binding virtual
-function device to the driver.
+function device to the driver. For subfunctions, this means user should
+configure port function attribute before activating the port function.
 
 User may set the hardware address of the function represented by the devlink
 port function. For Ethernet port function this means a MAC address.
+
+Subfunctions
+============
+
+Subfunctions are lightweight functions that has parent PCI function on which
+it is deployed. Subfunctions are created and deployed in unit of 1. Unlike
+SRIOV VFs, they don't require their own PCI virtual function. They communicate
+with the hardware through the parent PCI function. Subfunctions can possibly
+scale better.
+
+To use a subfunction, 3 steps setup sequence is followed.
+(1) create - create a subfunction;
+(2) configure - configure subfunction attributes;
+(3) deploy - deploy the subfunction;
+
+Subfunction management is done using devlink port user interface.
+User performs setup on the subfunction management device.
+
+(1) Create
+----------
+A subfunction is created using a devlink port interface. User adds the
+subfunction by adding a devlink port of subfunction flavour. The devlink
+kernel code calls down to subfunction management driver (devlink op) and asks
+it to create a subfunction devlink port. Driver then instantiates the
+subfunction port and any associated objects such as health reporters and
+representor netdevice.
+
+(2) Configure
+-------------
+Subfunction devlink port is created but it is not active yet. That means the
+entities are created on devlink side, the e-switch port representor is created,
+but the subfunction device itself it not created. User might use e-switch port
+representor to do settings, putting it into bridge, adding TC rules, etc. User
+might as well configure the hardware address (such as MAC address) of the
+subfunction while subfunction is inactive.
+
+(3) Deploy
+----------
+Once subfunction is configured, user must activate it to use it. Upon
+activation, subfunction management driver asks the subfunction management
+device to instantiate the actual subfunction device on particular PCI function.
+A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst <auxiliary_bus>`. At this point matching
+subfunction driver binds to the subfunction's auxiliary device.
+
+Terms and Definitions
+=====================
+
+.. list-table:: Terms and Definitions
+   :widths: 22 90
+
+   * - Term
+     - Definitions
+   * - ``PCI device``
+     - A physical PCI device having one or more PCI bus consists of one or
+       more PCI controllers.
+   * - ``PCI controller``
+     -  A controller consists of potentially multiple physical functions,
+        virtual functions and subfunctions.
+   * - ``Port function``
+     -  An object to manage the function of a port.
+   * - ``Subfunction``
+     -  A lightweight function that has parent PCI function on which it is
+        deployed.
+   * - ``Subfunction device``
+     -  A bus device of the subfunction, usually on a auxiliary bus.
+   * - ``Subfunction driver``
+     -  A device driver for the subfunction auxiliary device.
+   * - ``Subfunction management device``
+     -  A PCI physical function that supports subfunction management.
+   * - ``Subfunction management driver``
+     -  A device driver for PCI physical function that supports
+        subfunction management using devlink port interface.
+   * - ``Subfunction host driver``
+     -  A device driver for PCI physical function that host subfunction
+        devices. In most cases it is same as subfunction management driver. When
+        subfunction is used on external controller, subfunction management and
+        host drivers are different.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v2 14/14] net/mlx5: Add devlink subfunction port documentation
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (12 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 13/14] devlink: Extend devlink port documentation for subfunctions saeed
@ 2020-12-09  7:29 ` saeed
  2020-12-11  4:11 ` [PATCH net-next v2 00/14] Add mlx5 subfunction support Samudrala, Sridhar
  14 siblings, 0 replies; 17+ messages in thread
From: saeed @ 2020-12-09  7:29 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	Sridhar Samudrala, david.m.ertman, dan.j.williams, kiran.patil,
	gregkh, Parav Pandit, Saeed Mahameed

From: Parav Pandit <parav@nvidia.com>

Add documentation for subfunction management using devlink
port.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
Changelog:
v1->v2:
 - new patch
---
 .../device_drivers/ethernet/mellanox/mlx5.rst | 204 ++++++++++++++++++
 1 file changed, 204 insertions(+)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
index a5eb22793bb9..07e38c044355 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
@@ -12,6 +12,8 @@ Contents
 - `Enabling the driver and kconfig options`_
 - `Devlink info`_
 - `Devlink parameters`_
+- `mlx5 subfunction`_
+- `mlx5 port function`_
 - `Devlink health reporters`_
 - `mlx5 tracepoints`_
 
@@ -181,6 +183,208 @@ User command examples:
       values:
          cmode driverinit value true
 
+mlx5 subfunction
+================
+mlx5 supports subfunctions management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
+
+A Subfunction has its own function capabilities and its own resources. This
+means a subfunction has its own dedicated queues(txq, rxq, cq, eq). These queues
+are neither shared nor stealed from the parent PCI function.
+
+When subfunction is RDMA capable, it has its own QP1, GID table and rdma
+resources neither shared nor stealed from the parent PCI function.
+
+A subfunction has dedicated window in PCI BAR space that is not shared
+with ther other subfunctions or parent PCI function. This ensures that all
+class devices of the subfunction accesses only assigned PCI BAR space.
+
+A Subfunction supports eswitch representation through which it supports tc
+offloads. User must configure eswitch to send/receive packets from/to
+subfunction port.
+
+Subfunctions share PCI level resources such as PCI MSI-X IRQs with
+ther other subfunctions and/or with its parent PCI function.
+
+Example mlx5 software, system and device view::
+
+       _______
+      | admin |
+      | user  |----------
+      |_______|         |
+          |             |
+      ____|____       __|______            _________________
+     |         |     |         |          |                 |
+     | devlink |     | tc tool |          |    user         |
+     | tool    |     |_________|          | applications    |
+     |_________|         |                |_________________|
+           |             |                   |          |
+           |             |                   |          |         Userspace
+ +---------|-------------|-------------------|----------|--------------------+
+           |             |           +----------+   +----------+   Kernel
+           |             |           |  netdev  |   | rdma dev |
+           |             |           +----------+   +----------+
+   (devlink port add/del |              ^               ^
+    port function set)   |              |               |
+           |             |              +---------------|
+      _____|___          |              |        _______|_______
+     |         |         |              |       | mlx5 class    |
+     | devlink |   +------------+       |       |   drivers     |
+     | kernel  |   | rep netdev |       |       |(mlx5_core,ib) |
+     |_________|   +------------+       |       |_______________|
+           |             |              |               ^
+   (devlink ops)         |              |          (probe/remove)
+  _________|________     |              |           ____|________
+ | subfunction      |    |     +---------------+   | subfunction |
+ | management driver|-----     | subfunction   |---|  driver     |
+ | (mlx5_core)      |          | auxiliary dev |   | (mlx5_core) |
+ |__________________|          +---------------+   |_____________|
+           |                                            ^
+  (sf add/del, vhca events)                             |
+           |                                      (device add/del)
+      _____|____                                    ____|________
+     |          |                                  | subfunction |
+     |  PCI NIC |---- activate/deactive events---->| host driver |
+     |__________|                                  | (mlx5_core) |
+                                                   |_____________|
+
+Subfunction is created using devlink port interface.
+
+- Change device to switchdev mode::
+
+    $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
+
+- Add a devlink port of subfunction flaovur::
+
+    $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
+
+- Show a devlink port of the subfunction::
+
+    $ devlink port show pci/0000:06:00.0/32768
+    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
+      function:
+        hw_addr 00:00:00:00:00:00
+
+- Delete a devlink port of subfunction after use::
+
+    $ devlink port del pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
+
+mlx5 port function
+==================
+mlx5 driver provides mechanism to setup PCI VF/SF port function
+attributes in unified way for smartnic and non-smartnic NICs.
+
+This is supported only when eswitch mode is set to switchdev. Port function
+configuration of the PCI VF/SF is supported through devlink eswitch port.
+
+Port function attributes should be set before PCI VF/SF is enumerated by the
+driver.
+
+MAC address setup
+-----------------
+mlx5 driver provides mechanism to setup the MAC address of the PCI VF/SF.
+
+Configured MAC address of the PCI VF/SF will be used by netdevice and rdma
+device created for the PCI VF/SF.
+
+- Get MAC address of the VF identified by its unique devlink port index::
+
+    $ devlink port show pci/0000:06:00.0/2
+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+      function:
+        hw_addr 00:00:00:00:00:00
+
+- Set MAC address of the VF identified by its unique devlink port index::
+
+    $ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55
+
+    $ devlink port show pci/0000:06:00.0/2
+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+      function:
+        hw_addr 00:11:22:33:44:55
+
+- Get MAC address of the SF identified by its unique devlink port index::
+
+    $ devlink port show pci/0000:06:00.0/32768
+    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
+      function:
+        hw_addr 00:00:00:00:00:00
+
+- Set MAC address of the VF identified by its unique devlink port index::
+
+    $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88
+
+    $ devlink port show pci/0000:06:00.0/32768
+    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcivf pfnum 0 sfnum 88
+      function:
+        hw_addr 00:00:00:00:88:88
+
+SF state setup
+--------------
+To use the SF, user must active the SF using SF port function state attribute.
+
+- Get state of the SF identified by its unique devlink port index::
+
+   $ devlink port show ens2f0npf0sf88
+   pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
+     function:
+       hw_addr 00:00:00:00:88:88 state inactive opstate detached
+
+- Activate the function and verify its state is active::
+
+   $ devlink port function set ens2f0npf0sf88 state active
+
+   $ devlink port show ens2f0npf0sf88
+   pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
+     function:
+       hw_addr 00:00:00:00:88:88 state active opstate detached
+
+Upon function activation, PF driver instance gets the event from the device that
+particular SF was activated. It's the cue to put the device on bus, probe it and
+instantiate devlink instance and class specific auxiliary devices for it.
+
+- Show the auxiliary device and port of the subfunction::
+
+    $ devlink dev show
+    devlink dev show auxiliary/mlx5_core.sf.4
+
+    $ devlink port show auxiliary/mlx5_core.sf.4/1
+    auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
+
+    $ rdma link show mlx5_0/1
+    link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
+
+    $ rdma dev show
+    8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
+    13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
+
+- Subfunction auxilary device and class device hierarchy::
+
+                 mlx5_core.sf.4
+          (subfunction auxilary device)
+                       /\
+                      /  \
+                     /    \
+                    /      \
+                   /        \
+      mlx5_core.eth.4     mlx5_core.rdma.4
+     (sf eth aux dev)     (sf rdma aux dev)
+         |                      |
+         |                      |
+      p0sf88                  mlx5_0
+     (sf netdev)          (sf rdma device)
+
+Additionally SF port also gets the event when the driver attaches to the
+auxiliary device of the subfunction. This results in changing the operational
+state of the function. This provides visiblity to user to decide when it is
+safe to delete the SF port for graceful termination of the subfunction.
+
+- Show the SF port operational state::
+
+    $ devlink port show ens2f0npf0sf88
+    pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
+      function:
+        hw_addr 00:00:00:00:88:88 state active opstate attached
+
 Devlink health reporters
 ========================
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v2 00/14] Add mlx5 subfunction support
  2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
                   ` (13 preceding siblings ...)
  2020-12-09  7:29 ` [PATCH net-next v2 14/14] net/mlx5: Add devlink subfunction port documentation saeed
@ 2020-12-11  4:11 ` Samudrala, Sridhar
  2020-12-11  8:33   ` Parav Pandit
  14 siblings, 1 reply; 17+ messages in thread
From: Samudrala, Sridhar @ 2020-12-11  4:11 UTC (permalink / raw)
  To: saeed, David S. Miller, Jakub Kicinski, Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	david.m.ertman, dan.j.williams, kiran.patil, gregkh,
	Parav Pandit

On 12/8/2020 11:29 PM, saeed@kernel.org wrote:
> From: Parav Pandit <parav@nvidia.com>
>
> Hi Dave, Jakub, Jason,
>
> This series form Parav was the theme of this mlx5 release cycle,
> we've been waiting anxiously for the auxbus infrastructure to make it into
> the kernel, and now as the auxbus is in and all the stars are aligned, I
> can finally submit this V2 of the devlink and mlx5 subfunction support.
>
> Subfunctions came to solve the scaling issue of virtualization
> and switchdev environments, where SRIOV failed to deliver and users ran
> out of VFs very quickly as SRIOV demands huge amount of physical resources
> in both of the servers and the NIC.
>
> Subfunction provide the same functionality as SRIOV but in a very
> lightweight manner, please see the thorough and detailed
> documentation from Parav below, in the commit messages and the
> Networking documentation patches at the end of this series.

What is the mechanism for assigning these subfunctions to VMs?
OR is this only targeted for container usecases at this time?

>
> Sending V2 as a continuation to V1 that was sent Last month [0],
> Parav has provided full change-log in the commit message of each patch.
> [0] https://lore.kernel.org/linux-rdma/20201112192424.2742-1-parav@nvidia.com/
>
> Parav Pandit Says:
> =================
>
> This patchset introduces support for mlx5 subfunction (SF).
>
> A subfunction is a lightweight function that has a parent PCI function on
> which it is deployed. mlx5 subfunction has its own function capabilities
> and its own resources. This means a subfunction has its own dedicated
> queues(txq, rxq, cq, eq). These queues are neither shared nor stealed from
> the parent PCI function.
>
> When subfunction is RDMA capable, it has its own QP1, GID table and rdma
> resources neither shared nor stealed from the parent PCI function.
>
> A subfunction has dedicated window in PCI BAR space that is not shared
> with ther other subfunctions or parent PCI function. This ensures that all
> class devices of the subfunction accesses only assigned PCI BAR space.
>
> A Subfunction supports eswitch representation through which it supports tc
> offloads. User must configure eswitch to send/receive packets from/to
> subfunction port.
>
> Subfunctions share PCI level resources such as PCI MSI-X IRQs with
> their other subfunctions and/or with its parent PCI function.
>
> Patch summary:
> --------------
> Patch 1 to 4 prepares devlink
> patch 5 to 7 mlx5 adds SF device support
> Patch 8 to 11 mlx5 adds SF devlink port support
> Patch 12 and 14 adds documentation
>
> Patch-1 prepares code to handle multiple port function attributes
> Patch-2 introduces devlink pcisf port flavour similar to pcipf and pcivf
> Patch-3 adds port add and delete driver callbacks
> Patch-4 adds port function state get and set callbacks
> Patch-5 mlx5 vhca event notifier support to distribute subfunction
>          state change notification
> Patch-6 adds SF auxiliary device
> Patch-7 adds SF auxiliary driver
> Patch-8 prepares eswitch to handler SF vport
> Patch-9 adds eswitch helpers to add/remove SF vport
> Patch-10 implements devlink port add/del callbacks
> Patch-11 implements devlink port function get/set callbacks
> Patch-12 to 14 adds documentation
> Patch-12 added mlx5 port function documentation
> Patch-13 adds subfunction documentation
> Patch-14 adds mlx5 subfunction documentation
>
> Subfunction support is discussed in detail in RFC [1] and [2].
> RFC [1] and extension [2] describes requirements, design and proposed
> plumbing using devlink, auxiliary bus and sysfs for systemd/udev
> support. Functionality of this patchset is best explained using real
> examples further below.
>
> overview:
> --------
> A subfunction can be created and deleted by a user using devlink port
> add/delete interface.
>
> A subfunction can be configured using devlink port function attribute
> before its activated.
>
> When a subfunction is activated, it results in an auxiliary device on
> the host PCI device where it is deployed. A driver binds to the
> auxiliary device that further creates supported class devices.
>
> example subfunction usage sequence:
> -----------------------------------
> Change device to switchdev mode:
> $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
>
> Add a devlink port of subfunction flaovur:
> $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
Is there any requirement that subfunctions can be created only when 
eswitch mode is set to switchdev?
I think we should not restrict this functionality without switchdev mode .

After this step, i guess an auxiliary device is created on the auxiliary 
bus and a devlink port.
Does "devlink port show" show this port and can we list the auxiliary 
device.
> Configure mac address of the port function:
> $ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88
What is ens2f0npf0sf88? Is this the port representer netdev? I think we 
should allow setting
this by passing the devlink port.

What about other attributes like number of queues, interrupt vectors and 
port
capabilities etc? Can we add other attributes via this interface?
>
> Now activate the function:
> $ devlink port function set ens2f0npf0sf88 state active
Is the subfunction netdev created after this step?
I thought there was a step to bind the auxiliary device to the driver.
How does the probe routine for the auxiliary device get invoked?
>
> Now use the auxiliary device and class devices:
> $ devlink dev show
> pci/0000:06:00.0
> auxiliary/mlx5_core.sf.4
>
> $ ip link show
> 127: ens2f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>      link/ether 24:8a:07:b3:d1:12 brd ff:ff:ff:ff:ff:ff
>      altname enp6s0f0np0
> 129: p0sf88: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>      link/ether 00:00:00:00:88:88 brd ff:ff:ff:ff:ff:ff
>
> $ rdma dev show
> 43: rdmap6s0f0: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d112 sys_image_guid 248a:0703:00b3:d112
> 44: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
>
> After use inactivate the function:
> $ devlink port function set ens2f0npf0sf88 state inactive
>
> Now delete the subfunction port:
> $ devlink port del ens2f0npf0sf88
>
> [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
> [2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2
>
> =================
> ---
> Changelog:
> v1->v2:
>   - added documentation for subfunction and its mlx5 implementation
>   - add MLX5_SF config option documentation
>   - rebased
>   - dropped devlink global lock improvement patch as mlx5 doesn't support
>     reload while SFs are allocated
>   - dropped devlink reload lock patch as mlx5 doesn't support reload
>     when SFs are allocated
>   - using updated vhca event from device to add remove auxiliary device
>   - split sf devlink port allocation and sf hardware context allocation
>
> Parav Pandit (13):
>    devlink: Prepare code to fill multiple port function attributes
>    devlink: Introduce PCI SF port flavour and port attribute
>    devlink: Support add and delete devlink port
>    devlink: Support get and set state of port function
>    net/mlx5: Introduce vhca state event notifier
>    net/mlx5: SF, Add auxiliary device support
>    net/mlx5: SF, Add auxiliary device driver
>    net/mlx5: E-switch, Add eswitch helpers for SF vport
>    net/mlx5: SF, Add port add delete functionality
>    net/mlx5: SF, Port function state change support
>    devlink: Add devlink port documentation
>    devlink: Extend devlink port documentation for subfunctions
>    net/mlx5: Add devlink subfunction port documentation
>
> Vu Pham (1):
>    net/mlx5: E-switch, Prepare eswitch to handle SF vport
>
>   Documentation/driver-api/auxiliary_bus.rst    |   2 +
>   .../device_drivers/ethernet/mellanox/mlx5.rst | 209 +++++++
>   .../networking/devlink/devlink-port.rst       | 199 +++++++
>   Documentation/networking/devlink/index.rst    |   1 +
>   .../net/ethernet/mellanox/mlx5/core/Kconfig   |  19 +
>   .../net/ethernet/mellanox/mlx5/core/Makefile  |   9 +
>   drivers/net/ethernet/mellanox/mlx5/core/cmd.c |   8 +
>   .../net/ethernet/mellanox/mlx5/core/devlink.c |  19 +
>   drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   5 +-
>   .../mellanox/mlx5/core/esw/acl/egress_ofld.c  |   2 +-
>   .../mellanox/mlx5/core/esw/devlink_port.c     |  41 ++
>   .../net/ethernet/mellanox/mlx5/core/eswitch.c |  48 +-
>   .../net/ethernet/mellanox/mlx5/core/eswitch.h |  78 +++
>   .../mellanox/mlx5/core/eswitch_offloads.c     |  47 +-
>   .../net/ethernet/mellanox/mlx5/core/events.c  |   7 +
>   .../net/ethernet/mellanox/mlx5/core/main.c    |  60 +-
>   .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  12 +
>   .../net/ethernet/mellanox/mlx5/core/pci_irq.c |  20 +
>   .../net/ethernet/mellanox/mlx5/core/sf/cmd.c  |  48 ++
>   .../ethernet/mellanox/mlx5/core/sf/dev/dev.c  | 271 +++++++++
>   .../ethernet/mellanox/mlx5/core/sf/dev/dev.h  |  55 ++
>   .../mellanox/mlx5/core/sf/dev/driver.c        | 101 ++++
>   .../ethernet/mellanox/mlx5/core/sf/devlink.c  | 552 ++++++++++++++++++
>   .../ethernet/mellanox/mlx5/core/sf/hw_table.c | 235 ++++++++
>   .../mlx5/core/sf/mlx5_ifc_vhca_event.h        |  82 +++
>   .../net/ethernet/mellanox/mlx5/core/sf/priv.h |  21 +
>   .../net/ethernet/mellanox/mlx5/core/sf/sf.h   |  92 +++
>   .../mellanox/mlx5/core/sf/vhca_event.c        | 189 ++++++
>   .../mellanox/mlx5/core/sf/vhca_event.h        |  57 ++
>   .../net/ethernet/mellanox/mlx5/core/vport.c   |   3 +-
>   include/linux/mlx5/driver.h                   |  16 +-
>   include/net/devlink.h                         |  79 +++
>   include/uapi/linux/devlink.h                  |  26 +
>   net/core/devlink.c                            | 266 ++++++++-
>   34 files changed, 2832 insertions(+), 47 deletions(-)
>   create mode 100644 Documentation/networking/devlink/devlink-port.rst
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/cmd.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/dev.h
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/mlx5_ifc_vhca_event.h
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/sf.h
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.c
>   create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sf/vhca_event.h
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH net-next v2 00/14] Add mlx5 subfunction support
  2020-12-11  4:11 ` [PATCH net-next v2 00/14] Add mlx5 subfunction support Samudrala, Sridhar
@ 2020-12-11  8:33   ` Parav Pandit
  0 siblings, 0 replies; 17+ messages in thread
From: Parav Pandit @ 2020-12-11  8:33 UTC (permalink / raw)
  To: Samudrala, Sridhar, saeed, David S. Miller, Jakub Kicinski,
	Jason Gunthorpe
  Cc: Leon Romanovsky, netdev, linux-rdma, David Ahern, Jacob Keller,
	david.m.ertman, dan.j.williams, kiran.patil, gregkh



> From: Samudrala, Sridhar <sridhar.samudrala@intel.com>
> Sent: Friday, December 11, 2020 9:42 AM
> 
> On 12/8/2020 11:29 PM, saeed@kernel.org wrote:
> > From: Parav Pandit <parav@nvidia.com>


> > Subfunction provide the same functionality as SRIOV but in a very
> > lightweight manner, please see the thorough and detailed documentation
> > from Parav below, in the commit messages and the Networking
> > documentation patches at the end of this series.
> 
> What is the mechanism for assigning these subfunctions to VMs?
> OR is this only targeted for container usecases at this time?
> 
Currently subfunction cannot be assigned to VM as_is.
Some more vfio_pci style software may be developed in future to map subfunction auxiliary device to the VM.

> >
> > Add a devlink port of subfunction flaovur:
> > $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
> Is there any requirement that subfunctions can be created only when
> eswitch mode is set to switchdev?
> I think we should not restrict this functionality without switchdev mode .
> 
It is not restricted. We discussed this before at [3].

> After this step, i guess an auxiliary device is created on the auxiliary bus and a
> devlink port.
> Does "devlink port show" show this port and can we list the auxiliary device.
Yes and yes.
Below command will show the the auxiliary device.
Auxiliary device is listed in detail in the patch_7 at [4] when its created.
$ devlink dev show auxiliary/mlx5_core.sf.4/
More below.

> > Configure mac address of the port function:
> > $ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88
> What is ens2f0npf0sf88? Is this the port representer netdev? 
Yes, it is representor netdev associated with the devlink port.

> I think we should allow setting this by passing the devlink port.
Absolutely. It is. Every devlink port is identified by a unique port index.
So
$ devlink port show pci/0000:06:00.0/<devlink_port_index>  will show it.

It is captured in detailed example in the commit log of the patch_7 that adds it at [4].
Also present in the Documentation of mlx5.rst patch_14 at [5].

I just used the representor netdev example as it was intuitive to view the world from eswitch side.
Bu yes, instead of netdev port index is already supported natively.

> 
> What about other attributes like number of queues, interrupt vectors and
> port capabilities etc? Can we add other attributes via this interface?
> >
We believe that capabilities of the function should be controlled using the port function set command.
At the moment only mac address can be configured.
Number of queues is a resource so devlink resource is more suitable interface.

> > Now activate the function:
> > $ devlink port function set ens2f0npf0sf88 state active
> Is the subfunction netdev created after this step?
Yes.
> I thought there was a step to bind the auxiliary device to the driver.
Yes. User can always bind/unbind auxiliary driver from the auxiliary device.
Currently auxiliary bus do not have option to disable autoprobe (per device).
This is something to be extended in future so that user can select how a subfunction device to be used in the host system.

> How does the probe routine for the auxiliary device get invoked?
> >
When the subfunction auxiliary device is placed on the auxiliary bus, driver core invokes the registered driver probe routine.
Please refer to patch _7 at [4]. It is similar to how a pci device is probed.

[3] https://lore.kernel.org/netdev/BY5PR12MB43225AA5A5E42E76C03F645BDC3F0@BY5PR12MB4322.namprd12.prod.outlook.com/
[4] https://lore.kernel.org/netdev/20201209072934.1272819-4-saeed@kernel.org/
[5] https://lore.kernel.org/netdev/20201209072934.1272819-15-saeed@kernel.org/


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-12-11  8:35 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-09  7:29 [PATCH net-next v2 00/14] Add mlx5 subfunction support saeed
2020-12-09  7:29 ` [PATCH net-next v2 01/14] devlink: Prepare code to fill multiple port function attributes saeed
2020-12-09  7:29 ` [PATCH net-next v2 02/14] devlink: Introduce PCI SF port flavour and port attribute saeed
2020-12-09  7:29 ` [PATCH net-next v2 03/14] devlink: Support add and delete devlink port saeed
2020-12-09  7:29 ` [PATCH net-next v2 04/14] devlink: Support get and set state of port function saeed
2020-12-09  7:29 ` [PATCH net-next v2 05/14] net/mlx5: Introduce vhca state event notifier saeed
2020-12-09  7:29 ` [PATCH net-next v2 06/14] net/mlx5: SF, Add auxiliary device support saeed
2020-12-09  7:29 ` [PATCH net-next v2 07/14] net/mlx5: SF, Add auxiliary device driver saeed
2020-12-09  7:29 ` [PATCH net-next v2 08/14] net/mlx5: E-switch, Prepare eswitch to handle SF vport saeed
2020-12-09  7:29 ` [PATCH net-next v2 09/14] net/mlx5: E-switch, Add eswitch helpers for " saeed
2020-12-09  7:29 ` [PATCH net-next v2 10/14] net/mlx5: SF, Add port add delete functionality saeed
2020-12-09  7:29 ` [PATCH net-next v2 11/14] net/mlx5: SF, Port function state change support saeed
2020-12-09  7:29 ` [PATCH net-next v2 12/14] devlink: Add devlink port documentation saeed
2020-12-09  7:29 ` [PATCH net-next v2 13/14] devlink: Extend devlink port documentation for subfunctions saeed
2020-12-09  7:29 ` [PATCH net-next v2 14/14] net/mlx5: Add devlink subfunction port documentation saeed
2020-12-11  4:11 ` [PATCH net-next v2 00/14] Add mlx5 subfunction support Samudrala, Sridhar
2020-12-11  8:33   ` Parav Pandit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.