netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable
@ 2022-12-04 14:16 Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 1/8] net/mlx5: Introduce IFC bits for migratable Shay Drory
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav, Shay Drory

This series is a complete rewrite of the series "devlink: Add port
function attribute to enable/disable roce"
link:
https://lore.kernel.org/netdev/20221102163954.279266-1-danielj@nvidia.com/

Currently mlx5 PCI VF and SF are enabled by default for RoCE
functionality. And mlx5 PCI VF is disable by dafault for migratable
functionality.

Currently a user does not have the ability to disable RoCE for a PCI
VF/SF device before such device is enumerated by the driver.

User is also incapable to do such setting from smartnic scenario for a
VF from the smartnic.

Current 'enable_roce' device knob is limited to do setting only at
driverinit time. By this time device is already created and firmware has
already allocated necessary system memory for supporting RoCE.

Also, Currently a user does not have the ability to enable migratable
for a PCI VF.

The above are a hyper visor level control, to set the functionality of
devices passed through to guests.

This is achieved by extending existing 'port function' object to control
capabilities of a function. This enables users to control capability of
the device before enumeration.

Examples when user prefers to disable RoCE for a VF when using switchdev
mode:

$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0
pfnum 0 vfnum 0 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 roce enable

$ devlink port function set pci/0000:06:00.0/1 roce disable
  
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev pf0vf0 flavour pcivf controller 0
pfnum 0 vfnum 0 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 roce disable

FAQs:
-----
1. What does roce enable/disable do?
Ans: It disables RoCE capability of the function before its enumerated,
so when driver reads the capability from the device firmware, it is
disabled.
At this point RDMA stack will not be able to create UD, QP1, RC, XRC
type of QPs. When RoCE is disabled, the GID table of all ports of the
device is disabled in the device and software stack.

2. How is the roce 'port function' option different from existing
devlink param?
Ans: RoCE attribute at the port function level disables the RoCE
capability at the specific function level; while enable_roce only does
at the software level.

3. Why is this option for disabling only RoCE and not the whole RDMA
device?
Ans: Because user still wants to use the RDMA device for non RoCE
commands in more memory efficient way.

Patch summary:
Patch-1 introduce ifc bits for migratable command
Patch-2 avoid partial port function request processing
Patch-3 move devlink hw_addr attribute doc to devlink file
Patch-4 adds devlink attribute to control roce
Patch-5 add generic setters/getters for other functions caps 
Patch-6 implements mlx5 callbacks for roce control
Patch-7 adds devlink attribute to control migratable
Patch-8 implements mlx5 callbacks for migratable control

---
v2->v3:
 - see patches 2,4,7 for a changelog.
v1->v2:
 - see patch 7 for a changelog.

Shay Drory (6):
  devlink: Validate port function request
  devlink: Move devlink port function hw_addr attr documentation
  devlink: Expose port function commands to control RoCE
  net/mlx5: Add generic getters for other functions caps
  devlink: Expose port function commands to control migratable
  net/mlx5: E-Switch, Implement devlink port function cmds to control
    migratable

Yishai Hadas (2):
  net/mlx5: Introduce IFC bits for migratable
  net/mlx5: E-Switch, Implement devlink port function cmds to control
    RoCE

 .../device_drivers/ethernet/mellanox/mlx5.rst |  46 ++--
 .../networking/devlink/devlink-port.rst       | 122 +++++++++-
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   4 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  43 ++++
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  11 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     | 210 +++++++++++++++++-
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   5 +-
 .../net/ethernet/mellanox/mlx5/core/pci_irq.c |   3 +-
 .../net/ethernet/mellanox/mlx5/core/vport.c   |  30 ++-
 include/linux/mlx5/mlx5_ifc.h                 |   6 +-
 include/linux/mlx5/vport.h                    |   2 +
 include/net/devlink.h                         |  40 ++++
 include/uapi/linux/devlink.h                  |  13 ++
 net/core/devlink.c                            | 198 ++++++++++++++++-
 14 files changed, 685 insertions(+), 48 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next V3 1/8] net/mlx5: Introduce IFC bits for migratable
  2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
@ 2022-12-04 14:16 ` Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 2/8] devlink: Validate port function request Shay Drory
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav

From: Yishai Hadas <yishaih@nvidia.com>

Introduce IFC related capabilities to enable setting VF to be able to
perform live migration. e.g.: to be migratable.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 5a4e914e2a6f..2093131483c7 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -68,6 +68,7 @@ enum {
 	MLX5_SET_HCA_CAP_OP_MOD_ODP                   = 0x2,
 	MLX5_SET_HCA_CAP_OP_MOD_ATOMIC                = 0x3,
 	MLX5_SET_HCA_CAP_OP_MOD_ROCE                  = 0x4,
+	MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE2       = 0x20,
 	MLX5_SET_HCA_CAP_OP_MODE_PORT_SELECTION       = 0x25,
 };
 
@@ -1875,7 +1876,10 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 };
 
 struct mlx5_ifc_cmd_hca_cap_2_bits {
-	u8	   reserved_at_0[0xa0];
+	u8	   reserved_at_0[0x80];
+
+	u8         migratable[0x1];
+	u8         reserved_at_81[0x1f];
 
 	u8	   max_reformat_insert_size[0x8];
 	u8	   max_reformat_insert_offset[0x8];
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next V3 2/8] devlink: Validate port function request
  2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 1/8] net/mlx5: Introduce IFC bits for migratable Shay Drory
@ 2022-12-04 14:16 ` Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 3/8] devlink: Move devlink port function hw_addr attr documentation Shay Drory
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav, Shay Drory

In order to avoid partial request processing, validate the request
before processing it.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
 - replace NL_SET_ERR_MSG_MOD with NL_SET_ERR_MSG_ATTR
---
 net/core/devlink.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index fca3ebee97b0..2b6e11277837 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1644,11 +1644,6 @@ static int devlink_port_function_hw_addr_set(struct devlink_port *port,
 		}
 	}
 
-	if (!ops->port_function_hw_addr_set) {
-		NL_SET_ERR_MSG_MOD(extack, "Port doesn't support function attributes");
-		return -EOPNOTSUPP;
-	}
-
 	return ops->port_function_hw_addr_set(port, hw_addr, hw_addr_len,
 					      extack);
 }
@@ -1662,12 +1657,27 @@ static int devlink_port_fn_state_set(struct devlink_port *port,
 
 	state = nla_get_u8(attr);
 	ops = port->devlink->ops;
-	if (!ops->port_fn_state_set) {
-		NL_SET_ERR_MSG_MOD(extack,
-				   "Function does not support state setting");
+	return ops->port_fn_state_set(port, state, extack);
+}
+
+static int devlink_port_function_validate(struct devlink_port *devlink_port,
+					  struct nlattr **tb,
+					  struct netlink_ext_ack *extack)
+{
+	const struct devlink_ops *ops = devlink_port->devlink->ops;
+
+	if (tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] &&
+	    !ops->port_function_hw_addr_set) {
+		NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR],
+				    "Port doesn't support function attributes");
 		return -EOPNOTSUPP;
 	}
-	return ops->port_fn_state_set(port, state, extack);
+	if (tb[DEVLINK_PORT_FN_ATTR_STATE] && !ops->port_fn_state_set) {
+		NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR],
+				    "Function does not support state setting");
+		return -EOPNOTSUPP;
+	}
+	return 0;
 }
 
 static int devlink_port_function_set(struct devlink_port *port,
@@ -1684,6 +1694,10 @@ static int devlink_port_function_set(struct devlink_port *port,
 		return err;
 	}
 
+	err = devlink_port_function_validate(port, tb, extack);
+	if (err)
+		return err;
+
 	attr = tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR];
 	if (attr) {
 		err = devlink_port_function_hw_addr_set(port, attr, extack);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next V3 3/8] devlink: Move devlink port function hw_addr attr documentation
  2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 1/8] net/mlx5: Introduce IFC bits for migratable Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 2/8] devlink: Validate port function request Shay Drory
@ 2022-12-04 14:16 ` Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE Shay Drory
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav, Shay Drory

devlink port function hw_addr attr documentation is in mlx5 specific
file while there is nothing mlx5 specific about it.
Move it to devlink-port.rst.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5.rst | 38 +----------------
 .../networking/devlink/devlink-port.rst       | 42 ++++++++++++++++++-
 2 files changed, 43 insertions(+), 37 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
index e8fa7ac9e6b1..07cfc1b07db3 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
@@ -351,42 +351,8 @@ driver.
 
 MAC address setup
 -----------------
-mlx5 driver provides mechanism to setup the MAC address of the PCI VF/SF.
-
-The configured MAC address of the PCI VF/SF will be used by netdevice and rdma
-device created for the PCI VF/SF.
-
-- Get the MAC address of the VF identified by its unique devlink port index::
-
-    $ devlink port show pci/0000:06:00.0/2
-    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
-      function:
-        hw_addr 00:00:00:00:00:00
-
-- Set the MAC address of the VF identified by its unique devlink port index::
-
-    $ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55
-
-    $ devlink port show pci/0000:06:00.0/2
-    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
-      function:
-        hw_addr 00:11:22:33:44:55
-
-- Get the MAC address of the SF identified by its unique devlink port index::
-
-    $ devlink port show pci/0000:06:00.0/32768
-    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
-      function:
-        hw_addr 00:00:00:00:00:00
-
-- Set the MAC address of the SF identified by its unique devlink port index::
-
-    $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88
-
-    $ devlink port show pci/0000:06:00.0/32768
-    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
-      function:
-        hw_addr 00:00:00:00:88:88
+mlx5 driver support devlink port function attr mechanism to setup MAC
+address. (refer to Documentation/networking/devlink/devlink-port.rst)
 
 SF state setup
 --------------
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 98557c2ab1c1..2c637f4aae8e 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -119,9 +119,49 @@ function device to the driver. For subfunctions, this means user should
 configure port function attribute before activating the port function.
 
 A user may set the hardware address of the function using
-'devlink port function set hw_addr' command. For Ethernet port function
+`devlink port function set hw_addr` command. For Ethernet port function
 this means a MAC address.
 
+Function attributes
+===================
+
+MAC address setup
+-----------------
+The configured MAC address of the PCI VF/SF will be used by netdevice and rdma
+device created for the PCI VF/SF.
+
+- Get the MAC address of the VF identified by its unique devlink port index::
+
+    $ devlink port show pci/0000:06:00.0/2
+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+      function:
+        hw_addr 00:00:00:00:00:00
+
+- Set the MAC address of the VF identified by its unique devlink port index::
+
+    $ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55
+
+    $ devlink port show pci/0000:06:00.0/2
+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+      function:
+        hw_addr 00:11:22:33:44:55
+
+- Get the MAC address of the SF identified by its unique devlink port index::
+
+    $ devlink port show pci/0000:06:00.0/32768
+    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
+      function:
+        hw_addr 00:00:00:00:00:00
+
+- Set the MAC address of the SF identified by its unique devlink port index::
+
+    $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88
+
+    $ devlink port show pci/0000:06:00.0/32768
+    pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88
+      function:
+        hw_addr 00:00:00:00:88:88
+
 Subfunction
 ============
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE
  2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
                   ` (2 preceding siblings ...)
  2022-12-04 14:16 ` [PATCH net-next V3 3/8] devlink: Move devlink port function hw_addr attr documentation Shay Drory
@ 2022-12-04 14:16 ` Shay Drory
  2022-12-05 10:12   ` Jiri Pirko
  2022-12-05 23:37   ` Shannon Nelson
  2022-12-04 14:16 ` [PATCH net-next V3 5/8] net/mlx5: Add generic getters for other functions caps Shay Drory
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav, Shay Drory

Expose port function commands to enable / disable RoCE, this is used to
control the port RoCE device capabilities.

When RoCE is disabled for a function of the port, function cannot create
any RoCE specific resources (e.g GID table).
It also saves system memory utilization. For example disabling RoCE enable a
VF/SF saves 1 Mbytes of system memory per function.

Example of a PCI VF port which supports function configuration:
Set RoCE of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
    function:
        hw_addr 00:00:00:00:00:00 roce enable

$ devlink port function set pci/0000:06:00.0/2 roce disable

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
    function:
        hw_addr 00:00:00:00:00:00 roce disable

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
 - change DEVLINK_PORT_FN_SET_CAP to devlink_port_fn_cap_fill.
 - move out DEVLINK_PORT_FN_CAPS_VALID_MASK from UAPI.
 - introduce DEVLINK_PORT_FN_CAP_ROCE and add _BIT suffix to
   devlink_port_fn_attr_cap.
 - remove DEVLINK_PORT_FN_ATTR_CAPS_MAX
---
 .../networking/devlink/devlink-port.rst       |  34 +++++-
 include/net/devlink.h                         |  19 +++
 include/uapi/linux/devlink.h                  |  10 ++
 net/core/devlink.c                            | 113 ++++++++++++++++++
 4 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 2c637f4aae8e..c3302d23e480 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -110,7 +110,7 @@ devlink ports for both the controllers.
 Function configuration
 ======================
 
-A user can configure the function attribute before enumerating the PCI
+Users can configure one or more function attributes before enumerating the PCI
 function. Usually it means, user should configure function attribute
 before a bus specific device for the function is created. However, when
 SRIOV is enabled, virtual function devices are created on the PCI bus.
@@ -122,6 +122,9 @@ A user may set the hardware address of the function using
 `devlink port function set hw_addr` command. For Ethernet port function
 this means a MAC address.
 
+Users may also set the RoCE capability of the function using
+`devlink port function set roce` command.
+
 Function attributes
 ===================
 
@@ -162,6 +165,35 @@ device created for the PCI VF/SF.
       function:
         hw_addr 00:00:00:00:88:88
 
+RoCE capability setup
+---------------------
+Not all PCI VFs/SFs require RoCE capability.
+
+When RoCE capability is disabled, it saves system memory per PCI VF/SF.
+
+When user disables RoCE capability for a VF/SF, user application cannot send or
+receive any RoCE packets through this VF/SF and RoCE GID table for this PCI
+will be empty.
+
+When RoCE capability is disabled in the device using port function attribute,
+VF/SF driver cannot override it.
+
+- Get RoCE capability of the VF device::
+
+    $ devlink port show pci/0000:06:00.0/2
+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+        function:
+            hw_addr 00:00:00:00:00:00 roce enable
+
+- Set RoCE capability of the VF device::
+
+    $ devlink port function set pci/0000:06:00.0/2 roce disable
+
+    $ devlink port show pci/0000:06:00.0/2
+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+        function:
+            hw_addr 00:00:00:00:00:00 roce disable
+
 Subfunction
 ============
 
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 5f6eca5e4a40..20306fb8a1d9 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1451,6 +1451,25 @@ struct devlink_ops {
 	int (*port_function_hw_addr_set)(struct devlink_port *port,
 					 const u8 *hw_addr, int hw_addr_len,
 					 struct netlink_ext_ack *extack);
+	/**
+	 * @port_function_roce_get: Port function's roce get function.
+	 *
+	 * Query RoCE state of a function managed by the devlink port.
+	 * Return -EOPNOTSUPP if port function RoCE handling is not supported.
+	 */
+	int (*port_function_roce_get)(struct devlink_port *devlink_port,
+				      bool *is_enable,
+				      struct netlink_ext_ack *extack);
+	/**
+	 * @port_function_roce_set: Port function's roce set function.
+	 *
+	 * Enable/Disable the RoCE state of a function managed by the devlink
+	 * port.
+	 * Return -EOPNOTSUPP if port function RoCE handling is not supported.
+	 */
+	int (*port_function_roce_set)(struct devlink_port *devlink_port,
+				      bool enable,
+				      struct netlink_ext_ack *extack);
 	/**
 	 * port_new() - Add a new port function of a specified flavor
 	 * @devlink: Devlink instance
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 70191d96af89..6cc2925bd478 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -658,11 +658,21 @@ enum devlink_resource_unit {
 	DEVLINK_RESOURCE_UNIT_ENTRY,
 };
 
+enum devlink_port_fn_attr_cap {
+	DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT,
+
+	/* Add new caps above */
+	__DEVLINK_PORT_FN_ATTR_CAPS_MAX,
+};
+
+#define DEVLINK_PORT_FN_CAP_ROCE _BITUL(DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT)
+
 enum devlink_port_function_attr {
 	DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
 	DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR,	/* binary */
 	DEVLINK_PORT_FN_ATTR_STATE,	/* u8 */
 	DEVLINK_PORT_FN_ATTR_OPSTATE,	/* u8 */
+	DEVLINK_PORT_FN_ATTR_CAPS,	/* bitfield32 */
 
 	__DEVLINK_PORT_FUNCTION_ATTR_MAX,
 	DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 2b6e11277837..5c4d3abd7677 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -195,11 +195,16 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwmsg);
 EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwerr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
 
+#define DEVLINK_PORT_FN_CAPS_VALID_MASK \
+	(_BITUL(__DEVLINK_PORT_FN_ATTR_CAPS_MAX) - 1)
+
 static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1] = {
 	[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] = { .type = NLA_BINARY },
 	[DEVLINK_PORT_FN_ATTR_STATE] =
 		NLA_POLICY_RANGE(NLA_U8, DEVLINK_PORT_FN_STATE_INACTIVE,
 				 DEVLINK_PORT_FN_STATE_ACTIVE),
+	[DEVLINK_PORT_FN_ATTR_CAPS] =
+		NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK),
 };
 
 static const struct nla_policy devlink_selftest_nl_policy[DEVLINK_ATTR_SELFTEST_ID_MAX + 1] = {
@@ -692,6 +697,60 @@ devlink_sb_tc_index_get_from_attrs(struct devlink_sb *devlink_sb,
 	return 0;
 }
 
+static void devlink_port_fn_cap_fill(struct nla_bitfield32 *caps,
+				     u32 cap, bool is_enable)
+{
+	caps->selector |= cap;
+	if (is_enable)
+		caps->value |= cap;
+}
+
+static int devlink_port_fn_roce_fill(const struct devlink_ops *ops,
+				     struct devlink_port *devlink_port,
+				     struct nla_bitfield32 *caps,
+				     struct netlink_ext_ack *extack)
+{
+	bool is_enable;
+	int err;
+
+	if (!ops->port_function_roce_get)
+		return 0;
+
+	err = ops->port_function_roce_get(devlink_port, &is_enable, extack);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+
+	devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_ROCE, is_enable);
+	return 0;
+}
+
+static int devlink_port_fn_caps_fill(const struct devlink_ops *ops,
+				     struct devlink_port *devlink_port,
+				     struct sk_buff *msg,
+				     struct netlink_ext_ack *extack,
+				     bool *msg_updated)
+{
+	struct nla_bitfield32 caps = {};
+	int err;
+
+	err = devlink_port_fn_roce_fill(ops, devlink_port, &caps, extack);
+	if (err)
+		return err;
+
+	if (!caps.selector)
+		return 0;
+	err = nla_put_bitfield32(msg, DEVLINK_PORT_FN_ATTR_CAPS, caps.value,
+				 caps.selector);
+	if (err)
+		return err;
+
+	*msg_updated = true;
+	return 0;
+}
+
 static int
 devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
 				  struct genl_info *info,
@@ -1275,6 +1334,35 @@ static int devlink_port_fn_state_fill(const struct devlink_ops *ops,
 	return 0;
 }
 
+static int
+devlink_port_fn_roce_set(struct devlink_port *devlink_port, bool enable,
+			 struct netlink_ext_ack *extack)
+{
+	const struct devlink_ops *ops = devlink_port->devlink->ops;
+
+	return ops->port_function_roce_set(devlink_port, enable, extack);
+}
+
+static int devlink_port_fn_caps_set(struct devlink_port *devlink_port,
+				    const struct nlattr *attr,
+				    struct netlink_ext_ack *extack)
+{
+	struct nla_bitfield32 caps;
+	u32 caps_value;
+	int err;
+
+	caps = nla_get_bitfield32(attr);
+	caps_value = caps.value & caps.selector;
+	if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE) {
+		err = devlink_port_fn_roce_set(devlink_port,
+					       caps_value & DEVLINK_PORT_FN_CAP_ROCE,
+					       extack);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
 static int
 devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
 				   struct netlink_ext_ack *extack)
@@ -1293,6 +1381,10 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
 					   &msg_updated);
 	if (err)
 		goto out;
+	err = devlink_port_fn_caps_fill(ops, port, msg, extack,
+					&msg_updated);
+	if (err)
+		goto out;
 	err = devlink_port_fn_state_fill(ops, port, msg, extack, &msg_updated);
 out:
 	if (err || !msg_updated)
@@ -1665,6 +1757,7 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port,
 					  struct netlink_ext_ack *extack)
 {
 	const struct devlink_ops *ops = devlink_port->devlink->ops;
+	struct nlattr *attr;
 
 	if (tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] &&
 	    !ops->port_function_hw_addr_set) {
@@ -1677,6 +1770,18 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port,
 				   "Function does not support state setting");
 		return -EOPNOTSUPP;
 	}
+	attr = tb[DEVLINK_PORT_FN_ATTR_CAPS];
+	if (attr) {
+		struct nla_bitfield32 caps;
+
+		caps = nla_get_bitfield32(attr);
+		if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE &&
+		    !ops->port_function_roce_set) {
+			NL_SET_ERR_MSG_ATTR(extack, attr,
+					    "Port doesn't support RoCE function attribute");
+			return -EOPNOTSUPP;
+		}
+	}
 	return 0;
 }
 
@@ -1704,6 +1809,14 @@ static int devlink_port_function_set(struct devlink_port *port,
 		if (err)
 			return err;
 	}
+
+	attr = tb[DEVLINK_PORT_FN_ATTR_CAPS];
+	if (attr) {
+		err = devlink_port_fn_caps_set(port, attr, extack);
+		if (err)
+			return err;
+	}
+
 	/* Keep this as the last function attribute set, so that when
 	 * multiple port function attributes are set along with state,
 	 * Those can be applied first before activating the state.
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next V3 5/8] net/mlx5: Add generic getters for other functions caps
  2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
                   ` (3 preceding siblings ...)
  2022-12-04 14:16 ` [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE Shay Drory
@ 2022-12-04 14:16 ` Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 6/8] net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE Shay Drory
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav, Shay Drory

Downstream patch requires to get other function GENERAL2 caps while
mlx5_vport_get_other_func_cap() gets only one type of caps (general).
Rename it to represent this and introduce a generic implementation
of mlx5_vport_get_other_func_cap().

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h        | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c          | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c            | 6 ++++--
 include/linux/mlx5/vport.h                                 | 2 ++
 5 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 9b6fbb19c22a..33dffcb8bdd7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3889,7 +3889,7 @@ static int mlx5_esw_query_vport_vhca_id(struct mlx5_eswitch *esw, u16 vport_num,
 	if (!query_ctx)
 		return -ENOMEM;
 
-	err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx);
+	err = mlx5_vport_get_other_func_general_cap(esw->dev, vport_num, query_ctx);
 	if (err)
 		goto out_free;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index a806e3de7b7c..09473983778f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -324,7 +324,8 @@ void mlx5_unload_one_devl_locked(struct mlx5_core_dev *dev);
 int mlx5_load_one(struct mlx5_core_dev *dev, bool recovery);
 int mlx5_load_one_devl_locked(struct mlx5_core_dev *dev, bool recovery);
 
-int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out);
+#define mlx5_vport_get_other_func_general_cap(dev, fid, out)		\
+	mlx5_vport_get_other_func_cap(dev, fid, out, MLX5_CAP_GENERAL)
 
 void mlx5_events_work_enqueue(struct mlx5_core_dev *dev, struct work_struct *work);
 static inline u32 mlx5_sriov_get_vf_total_msix(struct pci_dev *pdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index 662f1d55e30e..6bde18bcd42f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -4,6 +4,7 @@
 #include <linux/interrupt.h>
 #include <linux/notifier.h>
 #include <linux/mlx5/driver.h>
+#include <linux/mlx5/vport.h>
 #include "mlx5_core.h"
 #include "mlx5_irq.h"
 #include "pci_irq.h"
@@ -101,7 +102,7 @@ int mlx5_set_msix_vec_count(struct mlx5_core_dev *dev, int function_id,
 		goto out;
 	}
 
-	ret = mlx5_vport_get_other_func_cap(dev, function_id, query_cap);
+	ret = mlx5_vport_get_other_func_general_cap(dev, function_id, query_cap);
 	if (ret)
 		goto out;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index d5c317325030..7eca7582f243 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -1160,14 +1160,16 @@ u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev)
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_system_image_guid);
 
-int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out)
+int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out,
+				  u16 opmod)
 {
-	u16 opmod = (MLX5_CAP_GENERAL << 1) | (HCA_CAP_OPMOD_GET_MAX & 0x01);
 	u8 in[MLX5_ST_SZ_BYTES(query_hca_cap_in)] = {};
 
+	opmod = (opmod << 1) | (HCA_CAP_OPMOD_GET_MAX & 0x01);
 	MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
 	MLX5_SET(query_hca_cap_in, in, op_mod, opmod);
 	MLX5_SET(query_hca_cap_in, in, function_id, function_id);
 	MLX5_SET(query_hca_cap_in, in, other_function, true);
 	return mlx5_cmd_exec_inout(dev, query_hca_cap, in, out);
 }
+EXPORT_SYMBOL_GPL(mlx5_vport_get_other_func_cap);
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index aad53cb72f17..7f31432f44c2 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -132,4 +132,6 @@ int mlx5_nic_vport_affiliate_multiport(struct mlx5_core_dev *master_mdev,
 int mlx5_nic_vport_unaffiliate_multiport(struct mlx5_core_dev *port_mdev);
 
 u64 mlx5_query_nic_system_image_guid(struct mlx5_core_dev *mdev);
+int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out,
+				  u16 opmod);
 #endif /* __MLX5_VPORT_H__ */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next V3 6/8] net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE
  2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
                   ` (4 preceding siblings ...)
  2022-12-04 14:16 ` [PATCH net-next V3 5/8] net/mlx5: Add generic getters for other functions caps Shay Drory
@ 2022-12-04 14:16 ` Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable Shay Drory
  2022-12-04 14:16 ` [PATCH net-next V3 8/8] net/mlx5: E-Switch, Implement devlink port function cmds " Shay Drory
  7 siblings, 0 replies; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav, Shay Drory

From: Yishai Hadas <yishaih@nvidia.com>

Implement devlink port function commands to enable / disable RoCE.
This is used to control the RoCE device capabilities.

This patch implement infrastructure which will be used by downstream
patches that will add additional capabilities.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5.rst |  10 ++
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   2 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  35 ++++++
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |   6 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     | 108 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   2 +
 .../net/ethernet/mellanox/mlx5/core/vport.c   |  24 ++++
 7 files changed, 186 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
index 07cfc1b07db3..8b8f95d1293a 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
@@ -354,6 +354,16 @@ MAC address setup
 mlx5 driver support devlink port function attr mechanism to setup MAC
 address. (refer to Documentation/networking/devlink/devlink-port.rst)
 
+RoCE capability setup
+---------------------
+Not all mlx5 PCI devices/SFs require RoCE capability.
+
+When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per
+PCI devices/SF.
+
+mlx5 driver support devlink port function attr mechanism to setup RoCE
+capability. (refer to Documentation/networking/devlink/devlink-port.rst)
+
 SF state setup
 --------------
 To use the SF, the user must activate the SF using the SF function state
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 751bc4a9edcf..992cdb3b7cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -314,6 +314,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
 	.rate_node_new = mlx5_esw_devlink_rate_node_new,
 	.rate_node_del = mlx5_esw_devlink_rate_node_del,
 	.rate_leaf_parent_set = mlx5_esw_devlink_rate_parent_set,
+	.port_function_roce_get = mlx5_devlink_port_function_roce_get,
+	.port_function_roce_set = mlx5_devlink_port_function_roce_set,
 #endif
 #ifdef CONFIG_MLX5_SF_MANAGER
 	.port_new = mlx5_devlink_sf_port_new,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 374e3fbdc2cf..001fb1e62135 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -772,6 +772,33 @@ static void esw_vport_cleanup_acl(struct mlx5_eswitch *esw,
 		esw_vport_destroy_offloads_acl_tables(esw, vport);
 }
 
+static int mlx5_esw_vport_caps_get(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
+{
+	int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
+	void *query_ctx;
+	void *hca_caps;
+	int err;
+
+	if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager))
+		return 0;
+
+	query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
+	if (!query_ctx)
+		return -ENOMEM;
+
+	err = mlx5_vport_get_other_func_cap(esw->dev, vport->vport, query_ctx,
+					    MLX5_CAP_GENERAL);
+	if (err)
+		goto out_free;
+
+	hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
+	vport->info.roce_enabled = MLX5_GET(cmd_hca_cap, hca_caps, roce);
+
+out_free:
+	kfree(query_ctx);
+	return err;
+}
+
 static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
 {
 	u16 vport_num = vport->vport;
@@ -785,6 +812,10 @@ static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
 	if (mlx5_esw_is_manager_vport(esw, vport_num))
 		return 0;
 
+	err = mlx5_esw_vport_caps_get(esw, vport);
+	if (err)
+		goto err_caps;
+
 	mlx5_modify_vport_admin_state(esw->dev,
 				      MLX5_VPORT_STATE_OP_MOD_ESW_VPORT,
 				      vport_num, 1,
@@ -804,6 +835,10 @@ static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
 			       vport->info.qos, flags);
 
 	return 0;
+
+err_caps:
+	esw_vport_cleanup_acl(esw, vport);
+	return err;
 }
 
 /* Don't cleanup vport->info, it's needed to restore vport configuration */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 42d9df417e20..71f27fb35c49 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -153,6 +153,7 @@ struct mlx5_vport_info {
 	u8                      qos;
 	u8                      spoofchk: 1;
 	u8                      trusted: 1;
+	u8                      roce_enabled: 1;
 };
 
 /* Vport context events */
@@ -508,7 +509,10 @@ int mlx5_devlink_port_function_hw_addr_get(struct devlink_port *port,
 int mlx5_devlink_port_function_hw_addr_set(struct devlink_port *port,
 					   const u8 *hw_addr, int hw_addr_len,
 					   struct netlink_ext_ack *extack);
-
+int mlx5_devlink_port_function_roce_get(struct devlink_port *port, bool *is_enabled,
+					struct netlink_ext_ack *extack);
+int mlx5_devlink_port_function_roce_set(struct devlink_port *port, bool enable,
+					struct netlink_ext_ack *extack);
 void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 rep_type);
 
 int mlx5_eswitch_add_vlan_action(struct mlx5_eswitch *esw,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 33dffcb8bdd7..f258fd7e27a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -4022,3 +4022,111 @@ int mlx5_devlink_port_function_hw_addr_set(struct devlink_port *port,
 
 	return mlx5_eswitch_set_vport_mac(esw, vport_num, hw_addr);
 }
+
+static struct mlx5_vport *
+mlx5_devlink_port_function_get_vport(struct devlink_port *port, struct mlx5_eswitch *esw)
+{
+	u16 vport_num;
+
+	if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	vport_num = mlx5_esw_devlink_port_index_to_vport_num(port->index);
+	if (!is_port_function_supported(esw, vport_num))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	return mlx5_eswitch_get_vport(esw, vport_num);
+}
+
+int mlx5_devlink_port_function_roce_get(struct devlink_port *port, bool *is_enabled,
+					struct netlink_ext_ack *extack)
+{
+	struct mlx5_eswitch *esw;
+	struct mlx5_vport *vport;
+	int err = -EOPNOTSUPP;
+
+	esw = mlx5_devlink_eswitch_get(port->devlink);
+	if (IS_ERR(esw))
+		return PTR_ERR(esw);
+
+	vport = mlx5_devlink_port_function_get_vport(port, esw);
+	if (IS_ERR(vport)) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid port");
+		return PTR_ERR(vport);
+	}
+
+	mutex_lock(&esw->state_lock);
+	if (vport->enabled) {
+		*is_enabled = vport->info.roce_enabled;
+		err = 0;
+	}
+	mutex_unlock(&esw->state_lock);
+	return err;
+}
+
+int mlx5_devlink_port_function_roce_set(struct devlink_port *port, bool enable,
+					struct netlink_ext_ack *extack)
+{
+	int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
+	struct mlx5_eswitch *esw;
+	struct mlx5_vport *vport;
+	int err = -EOPNOTSUPP;
+	void *query_ctx;
+	void *hca_caps;
+	u16 vport_num;
+
+	esw = mlx5_devlink_eswitch_get(port->devlink);
+	if (IS_ERR(esw))
+		return PTR_ERR(esw);
+
+	vport = mlx5_devlink_port_function_get_vport(port, esw);
+	if (IS_ERR(vport)) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid port");
+		return PTR_ERR(vport);
+	}
+	vport_num = vport->vport;
+
+	mutex_lock(&esw->state_lock);
+	if (!vport->enabled) {
+		NL_SET_ERR_MSG_MOD(extack, "Eswitch vport is disabled");
+		goto out;
+	}
+
+	if (vport->info.roce_enabled == enable) {
+		err = 0;
+		goto out;
+	}
+
+	query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
+	if (!query_ctx) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx,
+					    MLX5_CAP_GENERAL);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
+		goto out_free;
+	}
+
+	hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
+	memcpy(hca_caps, MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability),
+	       MLX5_UN_SZ_BYTES(hca_cap_union));
+	MLX5_SET(cmd_hca_cap, hca_caps, roce, enable);
+
+	err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport_num,
+					    MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA roce cap");
+		goto out_free;
+	}
+
+	vport->info.roce_enabled = enable;
+
+out_free:
+	kfree(query_ctx);
+out:
+	mutex_unlock(&esw->state_lock);
+	return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 09473983778f..029305a8b80a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -324,6 +324,8 @@ void mlx5_unload_one_devl_locked(struct mlx5_core_dev *dev);
 int mlx5_load_one(struct mlx5_core_dev *dev, bool recovery);
 int mlx5_load_one_devl_locked(struct mlx5_core_dev *dev, bool recovery);
 
+int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap, u16 function_id,
+				  u16 opmod);
 #define mlx5_vport_get_other_func_general_cap(dev, fid, out)		\
 	mlx5_vport_get_other_func_cap(dev, fid, out, MLX5_CAP_GENERAL)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 7eca7582f243..ba7e3df22413 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -1173,3 +1173,27 @@ int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, vo
 	return mlx5_cmd_exec_inout(dev, query_hca_cap, in, out);
 }
 EXPORT_SYMBOL_GPL(mlx5_vport_get_other_func_cap);
+
+int mlx5_vport_set_other_func_cap(struct mlx5_core_dev *dev, const void *hca_cap,
+				  u16 function_id, u16 opmod)
+{
+	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
+	void *set_hca_cap;
+	void *set_ctx;
+	int ret;
+
+	set_ctx = kzalloc(set_sz, GFP_KERNEL);
+	if (!set_ctx)
+		return -ENOMEM;
+
+	MLX5_SET(set_hca_cap_in, set_ctx, opcode, MLX5_CMD_OP_SET_HCA_CAP);
+	MLX5_SET(set_hca_cap_in, set_ctx, op_mod, opmod << 1);
+	set_hca_cap = MLX5_ADDR_OF(set_hca_cap_in, set_ctx, capability);
+	memcpy(set_hca_cap, hca_cap, MLX5_ST_SZ_BYTES(cmd_hca_cap));
+	MLX5_SET(set_hca_cap_in, set_ctx, function_id, function_id);
+	MLX5_SET(set_hca_cap_in, set_ctx, other_function, true);
+	ret = mlx5_cmd_exec_in(dev, set_hca_cap, set_ctx);
+
+	kfree(set_ctx);
+	return ret;
+}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable
  2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
                   ` (5 preceding siblings ...)
  2022-12-04 14:16 ` [PATCH net-next V3 6/8] net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE Shay Drory
@ 2022-12-04 14:16 ` Shay Drory
  2022-12-05 23:37   ` Shannon Nelson
  2022-12-04 14:16 ` [PATCH net-next V3 8/8] net/mlx5: E-Switch, Implement devlink port function cmds " Shay Drory
  7 siblings, 1 reply; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem
  Cc: danielj, yishaih, jiri, saeedm, parav, Shay Drory, Shannon Nelson

Expose port function commands to enable / disable migratable
capability, this is used to set the port function as migratable.

Live migration is the process of transferring a live virtual machine
from one physical host to another without disrupting its normal
operation.

In order for a VM to be able to perform LM, all the VM components must
be able to perform migration. e.g.: to be migratable.
In order for VF to be migratable, VF must be bound to VFIO driver with
migration support.

When migratable capability is enable for a function of the port, the
device is making the necessary preparations for the function to be
migratable, which might include disabling features which cannot be
migrated.

Example of LM with migratable function configuration:
Set migratable of the VF's port function.

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
    function:
        hw_addr 00:00:00:00:00:00 migratable disable

$ devlink port function set pci/0000:06:00.0/2 migratable enable

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
    function:
        hw_addr 00:00:00:00:00:00 migratable enable

Bind VF to VFIO driver with migration support:
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
$ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind

Attach VF to the VM.
Start the VM.
Perform LM.

Cc: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
v2->v3:
 - fix documentation warning
 - introduce DEVLINK_PORT_FN_CAP_MIGRATABLE
v1->v2:
 - fix documentation warning
---
 .../networking/devlink/devlink-port.rst       | 46 ++++++++++++++++
 include/net/devlink.h                         | 21 +++++++
 include/uapi/linux/devlink.h                  |  3 +
 net/core/devlink.c                            | 55 +++++++++++++++++++
 4 files changed, 125 insertions(+)

diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index c3302d23e480..3da590953ce8 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -125,6 +125,9 @@ this means a MAC address.
 Users may also set the RoCE capability of the function using
 `devlink port function set roce` command.
 
+Users may also set the function as migratable using
+'devlink port function set migratable' command.
+
 Function attributes
 ===================
 
@@ -194,6 +197,49 @@ VF/SF driver cannot override it.
         function:
             hw_addr 00:00:00:00:00:00 roce disable
 
+migratable capability setup
+---------------------------
+Live migration is the process of transferring a live virtual machine
+from one physical host to another without disrupting its normal
+operation.
+
+User who want PCI VFs to be able to perform live migration need to
+explicitly enable the VF migratable capability.
+
+When user enables migratable capability for a VF, and the HV binds the VF to VFIO driver
+with migration support, the user can migrate the VM with this VF from one HV to a
+different one.
+
+However, when migratable capability is enable, device will disable features which cannot
+be migrated. Thus migratable cap can impose limitations on a VF so let the user decide.
+
+Example of LM with migratable function configuration:
+- Get migratable capability of the VF device::
+
+    $ devlink port show pci/0000:06:00.0/2
+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+        function:
+            hw_addr 00:00:00:00:00:00 migratable disable
+
+- Set migratable capability of the VF device::
+
+    $ devlink port function set pci/0000:06:00.0/2 migratable enable
+
+    $ devlink port show pci/0000:06:00.0/2
+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+        function:
+            hw_addr 00:00:00:00:00:00 migratable enable
+
+- Bind VF to VFIO driver with migration support::
+
+    $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
+    $ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
+    $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind
+
+Attach VF to the VM.
+Start the VM.
+Perform live migration.
+
 Subfunction
 ============
 
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 20306fb8a1d9..fdb5e8da33ce 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1470,6 +1470,27 @@ struct devlink_ops {
 	int (*port_function_roce_set)(struct devlink_port *devlink_port,
 				      bool enable,
 				      struct netlink_ext_ack *extack);
+	/**
+	 * @port_function_mig_get: Port function's migratable get function.
+	 *
+	 * Query migratable state of a function managed by the devlink port.
+	 * Return -EOPNOTSUPP if port function migratable handling is not
+	 * supported.
+	 */
+	int (*port_function_mig_get)(struct devlink_port *devlink_port,
+				     bool *is_enable,
+				     struct netlink_ext_ack *extack);
+	/**
+	 * @port_function_mig_set: Port function's migratable set function.
+	 *
+	 * Enable/Disable migratable state of a function managed by the devlink
+	 * port.
+	 * Return -EOPNOTSUPP if port function migratable handling is not
+	 * supported.
+	 */
+	int (*port_function_mig_set)(struct devlink_port *devlink_port,
+				     bool enable,
+				     struct netlink_ext_ack *extack);
 	/**
 	 * port_new() - Add a new port function of a specified flavor
 	 * @devlink: Devlink instance
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 6cc2925bd478..3782d4219ac9 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -660,12 +660,15 @@ enum devlink_resource_unit {
 
 enum devlink_port_fn_attr_cap {
 	DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT,
+	DEVLINK_PORT_FN_ATTR_CAP_MIGRATABLE_BIT,
 
 	/* Add new caps above */
 	__DEVLINK_PORT_FN_ATTR_CAPS_MAX,
 };
 
 #define DEVLINK_PORT_FN_CAP_ROCE _BITUL(DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT)
+#define DEVLINK_PORT_FN_CAP_MIGRATABLE \
+	_BITUL(DEVLINK_PORT_FN_ATTR_CAP_MIGRATABLE_BIT)
 
 enum devlink_port_function_attr {
 	DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 5c4d3abd7677..bf2c1d3d6df3 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -727,6 +727,29 @@ static int devlink_port_fn_roce_fill(const struct devlink_ops *ops,
 	return 0;
 }
 
+static int devlink_port_function_mig_fill(const struct devlink_ops *ops,
+					  struct devlink_port *devlink_port,
+					  struct nla_bitfield32 *caps,
+					  struct netlink_ext_ack *extack)
+{
+	bool is_enable;
+	int err;
+
+	if (!ops->port_function_mig_get ||
+	    devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF)
+		return 0;
+
+	err = ops->port_function_mig_get(devlink_port, &is_enable, extack);
+	if (err) {
+		if (err == -EOPNOTSUPP)
+			return 0;
+		return err;
+	}
+
+	devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_MIGRATABLE, is_enable);
+	return 0;
+}
+
 static int devlink_port_fn_caps_fill(const struct devlink_ops *ops,
 				     struct devlink_port *devlink_port,
 				     struct sk_buff *msg,
@@ -740,6 +763,10 @@ static int devlink_port_fn_caps_fill(const struct devlink_ops *ops,
 	if (err)
 		return err;
 
+	err = devlink_port_function_mig_fill(ops, devlink_port, &caps, extack);
+	if (err)
+		return err;
+
 	if (!caps.selector)
 		return 0;
 	err = nla_put_bitfield32(msg, DEVLINK_PORT_FN_ATTR_CAPS, caps.value,
@@ -1334,6 +1361,15 @@ static int devlink_port_fn_state_fill(const struct devlink_ops *ops,
 	return 0;
 }
 
+static int
+devlink_port_fn_mig_set(struct devlink_port *devlink_port, bool enable,
+			struct netlink_ext_ack *extack)
+{
+	const struct devlink_ops *ops = devlink_port->devlink->ops;
+
+	return ops->port_function_mig_set(devlink_port, enable, extack);
+}
+
 static int
 devlink_port_fn_roce_set(struct devlink_port *devlink_port, bool enable,
 			 struct netlink_ext_ack *extack)
@@ -1360,6 +1396,13 @@ static int devlink_port_fn_caps_set(struct devlink_port *devlink_port,
 		if (err)
 			return err;
 	}
+	if (caps.selector & DEVLINK_PORT_FN_CAP_MIGRATABLE) {
+		err = devlink_port_fn_mig_set(devlink_port, caps_value &
+					      DEVLINK_PORT_FN_CAP_MIGRATABLE,
+					      extack);
+		if (err)
+			return err;
+	}
 	return 0;
 }
 
@@ -1781,6 +1824,18 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port,
 					    "Port doesn't support RoCE function attribute");
 			return -EOPNOTSUPP;
 		}
+		if (caps.selector & DEVLINK_PORT_FN_CAP_MIGRATABLE) {
+			if (!ops->port_function_mig_set) {
+				NL_SET_ERR_MSG_ATTR(extack, attr,
+						    "Port doesn't support migratable function attribute");
+				return -EOPNOTSUPP;
+			}
+			if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PCI_VF) {
+				NL_SET_ERR_MSG_ATTR(extack, attr,
+						    "migratable function attribute supported for VFs only");
+				return -EOPNOTSUPP;
+			}
+		}
 	}
 	return 0;
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next V3 8/8] net/mlx5: E-Switch, Implement devlink port function cmds to control migratable
  2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
                   ` (6 preceding siblings ...)
  2022-12-04 14:16 ` [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable Shay Drory
@ 2022-12-04 14:16 ` Shay Drory
  7 siblings, 0 replies; 17+ messages in thread
From: Shay Drory @ 2022-12-04 14:16 UTC (permalink / raw)
  To: netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav, Shay Drory

Implement devlink port function commands to enable / disable migratable.
This is used to control the migratable capability of the device.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../device_drivers/ethernet/mellanox/mlx5.rst |   8 ++
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   2 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |   8 ++
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |   5 +
 .../mellanox/mlx5/core/eswitch_offloads.c     | 100 ++++++++++++++++++
 5 files changed, 123 insertions(+)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
index 8b8f95d1293a..6969652f593c 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
@@ -364,6 +364,14 @@ PCI devices/SF.
 mlx5 driver support devlink port function attr mechanism to setup RoCE
 capability. (refer to Documentation/networking/devlink/devlink-port.rst)
 
+migratable capability setup
+---------------------------
+User who wants mlx5 PCI VFs to be able to perform live migration need to
+explicitly enable the VF migratable capability.
+
+mlx5 driver support devlink port function attr mechanism to setup migratable
+capability. (refer to Documentation/networking/devlink/devlink-port.rst)
+
 SF state setup
 --------------
 To use the SF, the user must activate the SF using the SF function state
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 992cdb3b7cc8..a674bf0b6046 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -316,6 +316,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
 	.rate_leaf_parent_set = mlx5_esw_devlink_rate_parent_set,
 	.port_function_roce_get = mlx5_devlink_port_function_roce_get,
 	.port_function_roce_set = mlx5_devlink_port_function_roce_set,
+	.port_function_mig_get = mlx5_devlink_port_function_mig_get,
+	.port_function_mig_set = mlx5_devlink_port_function_mig_set,
 #endif
 #ifdef CONFIG_MLX5_SF_MANAGER
 	.port_new = mlx5_devlink_sf_port_new,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 001fb1e62135..527e4bffda8d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -794,6 +794,14 @@ static int mlx5_esw_vport_caps_get(struct mlx5_eswitch *esw, struct mlx5_vport *
 	hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
 	vport->info.roce_enabled = MLX5_GET(cmd_hca_cap, hca_caps, roce);
 
+	memset(query_ctx, 0, query_out_sz);
+	err = mlx5_vport_get_other_func_cap(esw->dev, vport->vport, query_ctx,
+					    MLX5_CAP_GENERAL_2);
+	if (err)
+		goto out_free;
+
+	hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
+	vport->info.mig_enabled = MLX5_GET(cmd_hca_cap_2, hca_caps, migratable);
 out_free:
 	kfree(query_ctx);
 	return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 71f27fb35c49..8625b97411a9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -154,6 +154,7 @@ struct mlx5_vport_info {
 	u8                      spoofchk: 1;
 	u8                      trusted: 1;
 	u8                      roce_enabled: 1;
+	u8                      mig_enabled: 1;
 };
 
 /* Vport context events */
@@ -509,6 +510,10 @@ int mlx5_devlink_port_function_hw_addr_get(struct devlink_port *port,
 int mlx5_devlink_port_function_hw_addr_set(struct devlink_port *port,
 					   const u8 *hw_addr, int hw_addr_len,
 					   struct netlink_ext_ack *extack);
+int mlx5_devlink_port_function_mig_get(struct devlink_port *port, bool *is_enabled,
+				       struct netlink_ext_ack *extack);
+int mlx5_devlink_port_function_mig_set(struct devlink_port *port, bool enable,
+				       struct netlink_ext_ack *extack);
 int mlx5_devlink_port_function_roce_get(struct devlink_port *port, bool *is_enabled,
 					struct netlink_ext_ack *extack);
 int mlx5_devlink_port_function_roce_set(struct devlink_port *port, bool enable,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index f258fd7e27a8..ce38d9c0ad71 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -4038,6 +4038,106 @@ mlx5_devlink_port_function_get_vport(struct devlink_port *port, struct mlx5_eswi
 	return mlx5_eswitch_get_vport(esw, vport_num);
 }
 
+int mlx5_devlink_port_function_mig_get(struct devlink_port *port, bool *is_enabled,
+				       struct netlink_ext_ack *extack)
+{
+	struct mlx5_eswitch *esw;
+	struct mlx5_vport *vport;
+	int err = -EOPNOTSUPP;
+
+	esw = mlx5_devlink_eswitch_get(port->devlink);
+	if (IS_ERR(esw))
+		return PTR_ERR(esw);
+
+	if (!MLX5_CAP_GEN(esw->dev, migration)) {
+		NL_SET_ERR_MSG_MOD(extack, "Device doesn't support migration");
+		return err;
+	}
+
+	vport = mlx5_devlink_port_function_get_vport(port, esw);
+	if (IS_ERR(vport)) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid port");
+		return PTR_ERR(vport);
+	}
+
+	mutex_lock(&esw->state_lock);
+	if (vport->enabled) {
+		*is_enabled = vport->info.mig_enabled;
+		err = 0;
+	}
+	mutex_unlock(&esw->state_lock);
+	return err;
+}
+
+int mlx5_devlink_port_function_mig_set(struct devlink_port *port, bool enable,
+				       struct netlink_ext_ack *extack)
+{
+	int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
+	struct mlx5_eswitch *esw;
+	struct mlx5_vport *vport;
+	void *query_ctx;
+	void *hca_caps;
+	int err = -EOPNOTSUPP;
+
+	esw = mlx5_devlink_eswitch_get(port->devlink);
+	if (IS_ERR(esw))
+		return PTR_ERR(esw);
+
+	if (!MLX5_CAP_GEN(esw->dev, migration)) {
+		NL_SET_ERR_MSG_MOD(extack, "Device doesn't support migration");
+		return err;
+	}
+
+	vport = mlx5_devlink_port_function_get_vport(port, esw);
+	if (IS_ERR(vport)) {
+		NL_SET_ERR_MSG_MOD(extack, "Invalid port");
+		return PTR_ERR(vport);
+	}
+
+	mutex_lock(&esw->state_lock);
+	if (!vport->enabled) {
+		NL_SET_ERR_MSG_MOD(extack, "Eswitch vport is disabled");
+		goto out;
+	}
+
+	if (vport->info.mig_enabled == enable) {
+		err = 0;
+		goto out;
+	}
+
+	query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
+	if (!query_ctx) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = mlx5_vport_get_other_func_cap(esw->dev, vport->vport, query_ctx,
+					    MLX5_CAP_GENERAL_2);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
+		goto out_free;
+	}
+
+	hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
+	memcpy(hca_caps, MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability),
+	       MLX5_UN_SZ_BYTES(hca_cap_union));
+	MLX5_SET(cmd_hca_cap_2, hca_caps, migratable, 1);
+
+	err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport->vport,
+					    MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE2);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA migratable cap");
+		goto out_free;
+	}
+
+	vport->info.mig_enabled = enable;
+
+out_free:
+	kfree(query_ctx);
+out:
+	mutex_unlock(&esw->state_lock);
+	return err;
+}
 int mlx5_devlink_port_function_roce_get(struct devlink_port *port, bool *is_enabled,
 					struct netlink_ext_ack *extack)
 {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE
  2022-12-04 14:16 ` [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE Shay Drory
@ 2022-12-05 10:12   ` Jiri Pirko
  2022-12-05 23:37   ` Shannon Nelson
  1 sibling, 0 replies; 17+ messages in thread
From: Jiri Pirko @ 2022-12-05 10:12 UTC (permalink / raw)
  To: Shay Drory; +Cc: netdev, kuba, davem, danielj, yishaih, jiri, saeedm, parav

Sun, Dec 04, 2022 at 03:16:28PM CET, shayd@nvidia.com wrote:
>Expose port function commands to enable / disable RoCE, this is used to
>control the port RoCE device capabilities.
>
>When RoCE is disabled for a function of the port, function cannot create
>any RoCE specific resources (e.g GID table).
>It also saves system memory utilization. For example disabling RoCE enable a
>VF/SF saves 1 Mbytes of system memory per function.
>
>Example of a PCI VF port which supports function configuration:
>Set RoCE of the VF's port function.
>
>$ devlink port show pci/0000:06:00.0/2
>pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
>vfnum 1
>    function:
>        hw_addr 00:00:00:00:00:00 roce enable
>
>$ devlink port function set pci/0000:06:00.0/2 roce disable
>
>$ devlink port show pci/0000:06:00.0/2
>pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
>vfnum 1
>    function:
>        hw_addr 00:00:00:00:00:00 roce disable
>
>Signed-off-by: Shay Drory <shayd@nvidia.com>
>Reviewed-by: Jiri Pirko <jiri@nvidia.com>

When you do changes in the patch, you should remove reviewed-by and
acked-by tags.


>---
>v2->v3:
> - change DEVLINK_PORT_FN_SET_CAP to devlink_port_fn_cap_fill.
> - move out DEVLINK_PORT_FN_CAPS_VALID_MASK from UAPI.
> - introduce DEVLINK_PORT_FN_CAP_ROCE and add _BIT suffix to
>   devlink_port_fn_attr_cap.
> - remove DEVLINK_PORT_FN_ATTR_CAPS_MAX
>---
> .../networking/devlink/devlink-port.rst       |  34 +++++-
> include/net/devlink.h                         |  19 +++
> include/uapi/linux/devlink.h                  |  10 ++
> net/core/devlink.c                            | 113 ++++++++++++++++++
> 4 files changed, 175 insertions(+), 1 deletion(-)
>
>diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
>index 2c637f4aae8e..c3302d23e480 100644
>--- a/Documentation/networking/devlink/devlink-port.rst
>+++ b/Documentation/networking/devlink/devlink-port.rst
>@@ -110,7 +110,7 @@ devlink ports for both the controllers.
> Function configuration
> ======================
> 
>-A user can configure the function attribute before enumerating the PCI
>+Users can configure one or more function attributes before enumerating the PCI
> function. Usually it means, user should configure function attribute
> before a bus specific device for the function is created. However, when
> SRIOV is enabled, virtual function devices are created on the PCI bus.
>@@ -122,6 +122,9 @@ A user may set the hardware address of the function using
> `devlink port function set hw_addr` command. For Ethernet port function
> this means a MAC address.
> 
>+Users may also set the RoCE capability of the function using
>+`devlink port function set roce` command.
>+
> Function attributes
> ===================
> 
>@@ -162,6 +165,35 @@ device created for the PCI VF/SF.
>       function:
>         hw_addr 00:00:00:00:88:88
> 
>+RoCE capability setup
>+---------------------
>+Not all PCI VFs/SFs require RoCE capability.
>+
>+When RoCE capability is disabled, it saves system memory per PCI VF/SF.
>+
>+When user disables RoCE capability for a VF/SF, user application cannot send or
>+receive any RoCE packets through this VF/SF and RoCE GID table for this PCI
>+will be empty.
>+
>+When RoCE capability is disabled in the device using port function attribute,
>+VF/SF driver cannot override it.
>+
>+- Get RoCE capability of the VF device::
>+
>+    $ devlink port show pci/0000:06:00.0/2
>+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
>+        function:
>+            hw_addr 00:00:00:00:00:00 roce enable
>+
>+- Set RoCE capability of the VF device::
>+
>+    $ devlink port function set pci/0000:06:00.0/2 roce disable
>+
>+    $ devlink port show pci/0000:06:00.0/2
>+    pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
>+        function:
>+            hw_addr 00:00:00:00:00:00 roce disable
>+
> Subfunction
> ============
> 
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 5f6eca5e4a40..20306fb8a1d9 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -1451,6 +1451,25 @@ struct devlink_ops {
> 	int (*port_function_hw_addr_set)(struct devlink_port *port,
> 					 const u8 *hw_addr, int hw_addr_len,
> 					 struct netlink_ext_ack *extack);
>+	/**
>+	 * @port_function_roce_get: Port function's roce get function.
>+	 *
>+	 * Query RoCE state of a function managed by the devlink port.
>+	 * Return -EOPNOTSUPP if port function RoCE handling is not supported.
>+	 */
>+	int (*port_function_roce_get)(struct devlink_port *devlink_port,
>+				      bool *is_enable,
>+				      struct netlink_ext_ack *extack);
>+	/**
>+	 * @port_function_roce_set: Port function's roce set function.
>+	 *
>+	 * Enable/Disable the RoCE state of a function managed by the devlink
>+	 * port.
>+	 * Return -EOPNOTSUPP if port function RoCE handling is not supported.
>+	 */
>+	int (*port_function_roce_set)(struct devlink_port *devlink_port,
>+				      bool enable,
>+				      struct netlink_ext_ack *extack);
> 	/**
> 	 * port_new() - Add a new port function of a specified flavor
> 	 * @devlink: Devlink instance
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index 70191d96af89..6cc2925bd478 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -658,11 +658,21 @@ enum devlink_resource_unit {
> 	DEVLINK_RESOURCE_UNIT_ENTRY,
> };
> 
>+enum devlink_port_fn_attr_cap {
>+	DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT,
>+
>+	/* Add new caps above */
>+	__DEVLINK_PORT_FN_ATTR_CAPS_MAX,

Well this is not needed in uapi too, but I don't see any good way to
maintain this internally :/ No harm to expose.

Looks good,
Reviewed-by: Jiri Pirko <jiri@nvidia.com>




>+};
>+
>+#define DEVLINK_PORT_FN_CAP_ROCE _BITUL(DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT)
>+
> enum devlink_port_function_attr {
> 	DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
> 	DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR,	/* binary */
> 	DEVLINK_PORT_FN_ATTR_STATE,	/* u8 */
> 	DEVLINK_PORT_FN_ATTR_OPSTATE,	/* u8 */
>+	DEVLINK_PORT_FN_ATTR_CAPS,	/* bitfield32 */
> 
> 	__DEVLINK_PORT_FUNCTION_ATTR_MAX,
> 	DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 2b6e11277837..5c4d3abd7677 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -195,11 +195,16 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwmsg);
> EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_hwerr);
> EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
> 
>+#define DEVLINK_PORT_FN_CAPS_VALID_MASK \
>+	(_BITUL(__DEVLINK_PORT_FN_ATTR_CAPS_MAX) - 1)
>+
> static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ATTR_MAX + 1] = {
> 	[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] = { .type = NLA_BINARY },
> 	[DEVLINK_PORT_FN_ATTR_STATE] =
> 		NLA_POLICY_RANGE(NLA_U8, DEVLINK_PORT_FN_STATE_INACTIVE,
> 				 DEVLINK_PORT_FN_STATE_ACTIVE),
>+	[DEVLINK_PORT_FN_ATTR_CAPS] =
>+		NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK),
> };
> 
> static const struct nla_policy devlink_selftest_nl_policy[DEVLINK_ATTR_SELFTEST_ID_MAX + 1] = {
>@@ -692,6 +697,60 @@ devlink_sb_tc_index_get_from_attrs(struct devlink_sb *devlink_sb,
> 	return 0;
> }
> 
>+static void devlink_port_fn_cap_fill(struct nla_bitfield32 *caps,
>+				     u32 cap, bool is_enable)
>+{
>+	caps->selector |= cap;
>+	if (is_enable)
>+		caps->value |= cap;
>+}
>+
>+static int devlink_port_fn_roce_fill(const struct devlink_ops *ops,
>+				     struct devlink_port *devlink_port,
>+				     struct nla_bitfield32 *caps,
>+				     struct netlink_ext_ack *extack)
>+{
>+	bool is_enable;
>+	int err;
>+
>+	if (!ops->port_function_roce_get)
>+		return 0;
>+
>+	err = ops->port_function_roce_get(devlink_port, &is_enable, extack);
>+	if (err) {
>+		if (err == -EOPNOTSUPP)
>+			return 0;
>+		return err;
>+	}
>+
>+	devlink_port_fn_cap_fill(caps, DEVLINK_PORT_FN_CAP_ROCE, is_enable);
>+	return 0;
>+}
>+
>+static int devlink_port_fn_caps_fill(const struct devlink_ops *ops,
>+				     struct devlink_port *devlink_port,
>+				     struct sk_buff *msg,
>+				     struct netlink_ext_ack *extack,
>+				     bool *msg_updated)
>+{
>+	struct nla_bitfield32 caps = {};
>+	int err;
>+
>+	err = devlink_port_fn_roce_fill(ops, devlink_port, &caps, extack);
>+	if (err)
>+		return err;
>+
>+	if (!caps.selector)
>+		return 0;
>+	err = nla_put_bitfield32(msg, DEVLINK_PORT_FN_ATTR_CAPS, caps.value,
>+				 caps.selector);
>+	if (err)
>+		return err;
>+
>+	*msg_updated = true;
>+	return 0;
>+}
>+
> static int
> devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb,
> 				  struct genl_info *info,
>@@ -1275,6 +1334,35 @@ static int devlink_port_fn_state_fill(const struct devlink_ops *ops,
> 	return 0;
> }
> 
>+static int
>+devlink_port_fn_roce_set(struct devlink_port *devlink_port, bool enable,
>+			 struct netlink_ext_ack *extack)
>+{
>+	const struct devlink_ops *ops = devlink_port->devlink->ops;
>+
>+	return ops->port_function_roce_set(devlink_port, enable, extack);
>+}
>+
>+static int devlink_port_fn_caps_set(struct devlink_port *devlink_port,
>+				    const struct nlattr *attr,
>+				    struct netlink_ext_ack *extack)
>+{
>+	struct nla_bitfield32 caps;
>+	u32 caps_value;
>+	int err;
>+
>+	caps = nla_get_bitfield32(attr);
>+	caps_value = caps.value & caps.selector;
>+	if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE) {
>+		err = devlink_port_fn_roce_set(devlink_port,
>+					       caps_value & DEVLINK_PORT_FN_CAP_ROCE,
>+					       extack);
>+		if (err)
>+			return err;
>+	}
>+	return 0;
>+}
>+
> static int
> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
> 				   struct netlink_ext_ack *extack)
>@@ -1293,6 +1381,10 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
> 					   &msg_updated);
> 	if (err)
> 		goto out;
>+	err = devlink_port_fn_caps_fill(ops, port, msg, extack,
>+					&msg_updated);
>+	if (err)
>+		goto out;
> 	err = devlink_port_fn_state_fill(ops, port, msg, extack, &msg_updated);
> out:
> 	if (err || !msg_updated)
>@@ -1665,6 +1757,7 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port,
> 					  struct netlink_ext_ack *extack)
> {
> 	const struct devlink_ops *ops = devlink_port->devlink->ops;
>+	struct nlattr *attr;
> 
> 	if (tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR] &&
> 	    !ops->port_function_hw_addr_set) {
>@@ -1677,6 +1770,18 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port,
> 				   "Function does not support state setting");
> 		return -EOPNOTSUPP;
> 	}
>+	attr = tb[DEVLINK_PORT_FN_ATTR_CAPS];
>+	if (attr) {
>+		struct nla_bitfield32 caps;
>+
>+		caps = nla_get_bitfield32(attr);
>+		if (caps.selector & DEVLINK_PORT_FN_CAP_ROCE &&
>+		    !ops->port_function_roce_set) {
>+			NL_SET_ERR_MSG_ATTR(extack, attr,
>+					    "Port doesn't support RoCE function attribute");
>+			return -EOPNOTSUPP;
>+		}
>+	}
> 	return 0;
> }
> 
>@@ -1704,6 +1809,14 @@ static int devlink_port_function_set(struct devlink_port *port,
> 		if (err)
> 			return err;
> 	}
>+
>+	attr = tb[DEVLINK_PORT_FN_ATTR_CAPS];
>+	if (attr) {
>+		err = devlink_port_fn_caps_set(port, attr, extack);
>+		if (err)
>+			return err;
>+	}
>+
> 	/* Keep this as the last function attribute set, so that when
> 	 * multiple port function attributes are set along with state,
> 	 * Those can be applied first before activating the state.
>-- 
>2.38.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE
  2022-12-04 14:16 ` [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE Shay Drory
  2022-12-05 10:12   ` Jiri Pirko
@ 2022-12-05 23:37   ` Shannon Nelson
  2022-12-06  2:02     ` Jakub Kicinski
  1 sibling, 1 reply; 17+ messages in thread
From: Shannon Nelson @ 2022-12-05 23:37 UTC (permalink / raw)
  To: Shay Drory, netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav

On 12/4/22 6:16 AM, Shay Drory wrote:
> Expose port function commands to enable / disable RoCE, this is used to
> control the port RoCE device capabilities.
> 
> When RoCE is disabled for a function of the port, function cannot create
> any RoCE specific resources (e.g GID table).
> It also saves system memory utilization. For example disabling RoCE enable a
> VF/SF saves 1 Mbytes of system memory per function.
> 
> Example of a PCI VF port which supports function configuration:
> Set RoCE of the VF's port function.
> 
> $ devlink port show pci/0000:06:00.0/2
> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
> vfnum 1
>      function:
>          hw_addr 00:00:00:00:00:00 roce enable
> 
> $ devlink port function set pci/0000:06:00.0/2 roce disable
> 
> $ devlink port show pci/0000:06:00.0/2
> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
> vfnum 1
>      function:
>          hw_addr 00:00:00:00:00:00 roce disable
> 
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> ---



> +
> +#define DEVLINK_PORT_FN_CAP_ROCE _BITUL(DEVLINK_PORT_FN_ATTR_CAP_ROCE_BIT)
> +
>   enum devlink_port_function_attr {
>          DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
>          DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR,     /* binary */
>          DEVLINK_PORT_FN_ATTR_STATE,     /* u8 */
>          DEVLINK_PORT_FN_ATTR_OPSTATE,   /* u8 */
> +       DEVLINK_PORT_FN_ATTR_CAPS,      /* bitfield32 */

Will 32 bits be enough, or should we start off with u64?  It will 
probably be fine, but since we're setting a uapi thing here we probably 
want to be sure we won't need to change it in the future.

sln

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable
  2022-12-04 14:16 ` [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable Shay Drory
@ 2022-12-05 23:37   ` Shannon Nelson
  2022-12-06  1:56     ` Jakub Kicinski
  2022-12-06  8:55     ` Jiri Pirko
  0 siblings, 2 replies; 17+ messages in thread
From: Shannon Nelson @ 2022-12-05 23:37 UTC (permalink / raw)
  To: Shay Drory, netdev, kuba, davem; +Cc: danielj, yishaih, jiri, saeedm, parav

On 12/4/22 6:16 AM, Shay Drory wrote:
> Expose port function commands to enable / disable migratable
> capability, this is used to set the port function as migratable.

Since most or the devlink attributes, parameters, etc are named as nouns 
or verbs (e.g. roce, running, rate, err_count, enable_sriov, etc), 
seeing this term in an adjective form is a bit jarring.  This may seem 
like a picky thing, but can we use "migrate" or "migration" throughout 
this patch rather than "migratable"?

> 
> Live migration is the process of transferring a live virtual machine
> from one physical host to another without disrupting its normal
> operation.
> 
> In order for a VM to be able to perform LM, all the VM components must
> be able to perform migration. e.g.: to be migratable.
> In order for VF to be migratable, VF must be bound to VFIO driver with
> migration support.
> 
> When migratable capability is enable for a function of the port, the

s/enable/enabled/



> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index 20306fb8a1d9..fdb5e8da33ce 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -1470,6 +1470,27 @@ struct devlink_ops {
>          int (*port_function_roce_set)(struct devlink_port *devlink_port,
>                                        bool enable,
>                                        struct netlink_ext_ack *extack);
> +       /**
> +        * @port_function_mig_get: Port function's migratable get function.

I would prefer to see 'mig' spelled out as 'migration'

sln

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable
  2022-12-05 23:37   ` Shannon Nelson
@ 2022-12-06  1:56     ` Jakub Kicinski
  2022-12-06  8:55     ` Jiri Pirko
  1 sibling, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2022-12-06  1:56 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shay Drory, netdev, davem, danielj, yishaih, jiri, saeedm, parav

On Mon, 5 Dec 2022 15:37:44 -0800 Shannon Nelson wrote:
> > +        * @port_function_mig_get: Port function's migratable get function.  
> 
> I would prefer to see 'mig' spelled out as 'migration'

Seems reasonable, if anything we should abbreviating "function" here.
Devlink code and attrs use "fn" already.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE
  2022-12-05 23:37   ` Shannon Nelson
@ 2022-12-06  2:02     ` Jakub Kicinski
  2022-12-06  8:52       ` Jiri Pirko
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2022-12-06  2:02 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shay Drory, netdev, davem, danielj, yishaih, jiri, saeedm, parav

On Mon, 5 Dec 2022 15:37:26 -0800 Shannon Nelson wrote:
> >   enum devlink_port_function_attr {
> >          DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
> >          DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR,     /* binary */
> >          DEVLINK_PORT_FN_ATTR_STATE,     /* u8 */
> >          DEVLINK_PORT_FN_ATTR_OPSTATE,   /* u8 */
> > +       DEVLINK_PORT_FN_ATTR_CAPS,      /* bitfield32 */  
> 
> Will 32 bits be enough, or should we start off with u64?  It will 
> probably be fine, but since we're setting a uapi thing here we probably 
> want to be sure we won't need to change it in the future.

Ah, if only variable size integer types from Olek were ready :(

Unfortunately there is no bf64 today, so we'd either have to add soon
to be deprecated bf64 or hold off waiting for Olek...
I reckon the dumb thing of merging bf32 may be the best choice right
now :(

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE
  2022-12-06  2:02     ` Jakub Kicinski
@ 2022-12-06  8:52       ` Jiri Pirko
  0 siblings, 0 replies; 17+ messages in thread
From: Jiri Pirko @ 2022-12-06  8:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Shannon Nelson, Shay Drory, netdev, davem, danielj, yishaih,
	jiri, saeedm, parav

Tue, Dec 06, 2022 at 03:02:34AM CET, kuba@kernel.org wrote:
>On Mon, 5 Dec 2022 15:37:26 -0800 Shannon Nelson wrote:
>> >   enum devlink_port_function_attr {
>> >          DEVLINK_PORT_FUNCTION_ATTR_UNSPEC,
>> >          DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR,     /* binary */
>> >          DEVLINK_PORT_FN_ATTR_STATE,     /* u8 */
>> >          DEVLINK_PORT_FN_ATTR_OPSTATE,   /* u8 */
>> > +       DEVLINK_PORT_FN_ATTR_CAPS,      /* bitfield32 */  
>> 
>> Will 32 bits be enough, or should we start off with u64?  It will 
>> probably be fine, but since we're setting a uapi thing here we probably 
>> want to be sure we won't need to change it in the future.
>
>Ah, if only variable size integer types from Olek were ready :(

Or, if the bitfield was variable length from the beginning (as I asked
for :)).


>
>Unfortunately there is no bf64 today, so we'd either have to add soon
>to be deprecated bf64 or hold off waiting for Olek...
>I reckon the dumb thing of merging bf32 may be the best choice right
>now :(

+1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable
  2022-12-05 23:37   ` Shannon Nelson
  2022-12-06  1:56     ` Jakub Kicinski
@ 2022-12-06  8:55     ` Jiri Pirko
  2022-12-06 17:52       ` Shannon Nelson
  1 sibling, 1 reply; 17+ messages in thread
From: Jiri Pirko @ 2022-12-06  8:55 UTC (permalink / raw)
  To: Shannon Nelson
  Cc: Shay Drory, netdev, kuba, davem, danielj, yishaih, jiri, saeedm, parav

Tue, Dec 06, 2022 at 12:37:44AM CET, shnelson@amd.com wrote:
>On 12/4/22 6:16 AM, Shay Drory wrote:
>> Expose port function commands to enable / disable migratable
>> capability, this is used to set the port function as migratable.
>
>Since most or the devlink attributes, parameters, etc are named as nouns or
>verbs (e.g. roce, running, rate, err_count, enable_sriov, etc), seeing this
>term in an adjective form is a bit jarring.  This may seem like a picky
>thing, but can we use "migrate" or "migration" throughout this patch rather
>than "migratable"?

But it is about "ability to migrate". That from how I understand the
language, "migratable" describes the best, doesn't it?


>
>> 
>> Live migration is the process of transferring a live virtual machine
>> from one physical host to another without disrupting its normal
>> operation.
>> 
>> In order for a VM to be able to perform LM, all the VM components must
>> be able to perform migration. e.g.: to be migratable.
>> In order for VF to be migratable, VF must be bound to VFIO driver with
>> migration support.
>> 
>> When migratable capability is enable for a function of the port, the
>
>s/enable/enabled/
>
>
>
>> 
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index 20306fb8a1d9..fdb5e8da33ce 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -1470,6 +1470,27 @@ struct devlink_ops {
>>          int (*port_function_roce_set)(struct devlink_port *devlink_port,
>>                                        bool enable,
>>                                        struct netlink_ext_ack *extack);
>> +       /**
>> +        * @port_function_mig_get: Port function's migratable get function.
>
>I would prefer to see 'mig' spelled out as 'migration'
>
>sln

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable
  2022-12-06  8:55     ` Jiri Pirko
@ 2022-12-06 17:52       ` Shannon Nelson
  0 siblings, 0 replies; 17+ messages in thread
From: Shannon Nelson @ 2022-12-06 17:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Shay Drory, netdev, kuba, davem, danielj, yishaih, jiri, saeedm, parav

On 12/6/22 12:55 AM, Jiri Pirko wrote:
> Tue, Dec 06, 2022 at 12:37:44AM CET, shnelson@amd.com wrote:
>> On 12/4/22 6:16 AM, Shay Drory wrote:
>>> Expose port function commands to enable / disable migratable
>>> capability, this is used to set the port function as migratable.
>>
>> Since most or the devlink attributes, parameters, etc are named as nouns or
>> verbs (e.g. roce, running, rate, err_count, enable_sriov, etc), seeing this
>> term in an adjective form is a bit jarring.  This may seem like a picky
>> thing, but can we use "migrate" or "migration" throughout this patch rather
>> than "migratable"?
> 
> But it is about "ability to migrate". That from how I understand the
> language, "migratable" describes the best, doesn't it?

Yes, 'migratable' describes it, but as I said, the adjective form seems 
a bit jarring to read among the many noun and verb forms found in most 
of the rest of the IDs and ATTRs.

Now, after having some coffee this morning and looking through more of 
the lists, I see there are already a couple like this - 
DEVLINK_TRAP_GENERIC_ID_NON_ROUTABLE and DEVLINK_ATTR_PORT_SPLITTABLE.

Fine, carry on.
sln

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-12-06 17:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-04 14:16 [PATCH net-next V3 0/8] devlink: Add port function attribute to enable/disable Roce and migratable Shay Drory
2022-12-04 14:16 ` [PATCH net-next V3 1/8] net/mlx5: Introduce IFC bits for migratable Shay Drory
2022-12-04 14:16 ` [PATCH net-next V3 2/8] devlink: Validate port function request Shay Drory
2022-12-04 14:16 ` [PATCH net-next V3 3/8] devlink: Move devlink port function hw_addr attr documentation Shay Drory
2022-12-04 14:16 ` [PATCH net-next V3 4/8] devlink: Expose port function commands to control RoCE Shay Drory
2022-12-05 10:12   ` Jiri Pirko
2022-12-05 23:37   ` Shannon Nelson
2022-12-06  2:02     ` Jakub Kicinski
2022-12-06  8:52       ` Jiri Pirko
2022-12-04 14:16 ` [PATCH net-next V3 5/8] net/mlx5: Add generic getters for other functions caps Shay Drory
2022-12-04 14:16 ` [PATCH net-next V3 6/8] net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE Shay Drory
2022-12-04 14:16 ` [PATCH net-next V3 7/8] devlink: Expose port function commands to control migratable Shay Drory
2022-12-05 23:37   ` Shannon Nelson
2022-12-06  1:56     ` Jakub Kicinski
2022-12-06  8:55     ` Jiri Pirko
2022-12-06 17:52       ` Shannon Nelson
2022-12-04 14:16 ` [PATCH net-next V3 8/8] net/mlx5: E-Switch, Implement devlink port function cmds " Shay Drory

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).