All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
@ 2018-01-04 15:25 Leon Romanovsky
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

Changelog:
 v1 -> v2:
  * Dropped "IB/mlx5: Use correct mdev for vport queries in ib_virt"
 v0 -> v1:
  * Rebased to latest rdma/for-next
  * Enriched commit messages.

Thanks

-------------------------------------
>From Daniel:

This feature allows RDMA resources (pd, mr, cq, qp, etc) to be used with
both physical ports of capable mlx5 devices. When enabled a single IB
device with two ports will be registered instead of two single port
devices.

There are still two PCI devices underlying the two port device, the
capabilities indicate which device is the "master" device and which is
the slave.

When the add callback function is called for a slave device a list of IB
devices is searched for matching master device, indicated by the
capabilities and the system_image_guid. If a match is found the slave is
bound to the master device, otherwise it's placed on a list, in case
it's master becomes available in the future. When a master device is added it
searches the list of available slaves for a matching slave device. If a
match is found it binds the slave as its 2nd port. If no match as found
the device still appears as a dual port device, with the 2nd port down.
RDMA resources can still created that use the yet unavailable 2nd port.

Commands related to IB resources are all routed through the master
mlx5_core device. Port specific commands, like those for hardware
counters are routed to their respective port mlx5_core device. Since
devices can appear and disappear asynchronously a reference count on the
underlying mlx5_core device is maintained. Getting and putting this
reference is only necessary for commands destined to a specific port,
the master core device can be used freely, as it will exist while the IB
device exists.

SR-IOV devices follow the same pattern as the physical ones. VFs of a
master port can bind VFs of slave ports, if available, and operate as
dual port devices.

Examples of devices passed to a VM:
(master)         - One net device, one IB device that has two ports. The slave
                   port will always be down.
(slave)          - One net device, no IB devices.
(slave, slave)   - Two net devices and no IB devices.
(master, master) - Two net devices, two IB devices, each with two ports.
                   The slave port of each device will always be down.
(master, slave)  - Two net devices, one IB device, with two ports. Both
		    ports can be used.

There are no changes to the existing design for net devices.

The feature is disabled by default and it is enabled in firmware
with mlxconfig.

	Thanks
---------------------------------------

Daniel Jurgens (13):
  net/mlx5: Fix race for multiple RoCE enable
  net/mlx5: Set software owner ID during init HCA
  IB/core: Change roce_rescan_device to return void
  IB/mlx5: Reduce the use of num_port capability
  IB/mlx5: Make netdev notifications multiport capable
  {net,IB}/mlx5: Manage port association for multiport RoCE
  IB/mlx5: Move IB event processing onto a workqueue
  IB/mlx5: Implement dual port functionality in query routines
  IB/mlx5: Update counter implementation for dual port RoCE
  {net,IB}/mlx5: Change set_roce_gid to take a port number
  IB/mlx5: Route MADs for dual port RoCE
  IB/mlx5: Don't advertise RAW QP support in dual port mode
  net/mlx5: Set num_vhca_ports capability

Parav Pandit (1):
  IB/mlx5: Change debugfs to have per port contents

 drivers/infiniband/core/cache.c                    |   7 +-
 drivers/infiniband/core/core_priv.h                |   1 -
 drivers/infiniband/core/roce_gid_mgmt.c            |  13 +-
 drivers/infiniband/hw/mlx5/cong.c                  |  83 ++-
 drivers/infiniband/hw/mlx5/mad.c                   |  23 +-
 drivers/infiniband/hw/mlx5/main.c                  | 785 +++++++++++++++++----
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |  52 +-
 drivers/infiniband/hw/mlx5/qp.c                    |   8 +-
 .../net/ethernet/mellanox/mlx5/core/fpga/conn.c    |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c       |  10 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c  |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  12 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |  91 ++-
 include/linux/mlx5/device.h                        |   5 +
 include/linux/mlx5/driver.h                        |  29 +-
 include/linux/mlx5/mlx5_ifc.h                      |  32 +-
 include/linux/mlx5/vport.h                         |   4 +
 include/rdma/ib_verbs.h                            |   8 +
 20 files changed, 976 insertions(+), 207 deletions(-)

--
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 01/14] net/mlx5: Fix race for multiple RoCE enable
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 02/14] net/mlx5: Set software owner ID during init HCA Leon Romanovsky
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

There are two potential problems with the existing implementation.

1. Enable and disable can race after the atomic operations.
2. If a command fails the refcount is left in an inconsistent state.

Introduce a lock and perform error checking.

Fixes: a6f7d2aff623 ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 33 ++++++++++++++++++++-----
 include/linux/mlx5/driver.h                     |  2 +-
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index d653b0025b13..916523103f16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -36,6 +36,9 @@
 #include <linux/mlx5/vport.h>
 #include "mlx5_core.h"
 
+/* Mutex to hold while enabling or disabling RoCE */
+static DEFINE_MUTEX(mlx5_roce_en_lock);
+
 static int _mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod,
 				   u16 vport, u32 *out, int outlen)
 {
@@ -988,17 +991,35 @@ static int mlx5_nic_vport_update_roce_state(struct mlx5_core_dev *mdev,
 
 int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev)
 {
-	if (atomic_inc_return(&mdev->roce.roce_en) != 1)
-		return 0;
-	return mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_ENABLED);
+	int err = 0;
+
+	mutex_lock(&mlx5_roce_en_lock);
+	if (!mdev->roce.roce_en)
+		err = mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_ENABLED);
+
+	if (!err)
+		mdev->roce.roce_en++;
+	mutex_unlock(&mlx5_roce_en_lock);
+
+	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_nic_vport_enable_roce);
 
 int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev)
 {
-	if (atomic_dec_return(&mdev->roce.roce_en) != 0)
-		return 0;
-	return mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_DISABLED);
+	int err = 0;
+
+	mutex_lock(&mlx5_roce_en_lock);
+	if (mdev->roce.roce_en) {
+		mdev->roce.roce_en--;
+		if (mdev->roce.roce_en == 0)
+			err = mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_DISABLED);
+
+		if (err)
+			mdev->roce.roce_en++;
+	}
+	mutex_unlock(&mlx5_roce_en_lock);
+	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_nic_vport_disable_roce);
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 0776554f18dc..5b0443c9d337 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -835,7 +835,7 @@ struct mlx5_core_dev {
 	struct mlx5e_resources  mlx5e_res;
 	struct {
 		struct mlx5_rsvd_gids	reserved_gids;
-		atomic_t                roce_en;
+		u32			roce_en;
 	} roce;
 #ifdef CONFIG_MLX5_FPGA
 	struct mlx5_fpga_device *fpga;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 02/14] net/mlx5: Set software owner ID during init HCA
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2018-01-04 15:25   ` [PATCH rdma-next v2 01/14] net/mlx5: Fix race for multiple RoCE enable Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 03/14] IB/core: Change roce_rescan_device to return void Leon Romanovsky
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Generate a unique 128bit identifier for each host and pass that value to
firmware in the INIT_HCA command if it reports the sw_owner_id
capability. Each device bound to the mlx5_core driver will have the same
software owner ID.

In subsequent patches mlx5_core devices will be bound via a new VPort
command so that they can operate together under a single InfiniBand
device. Only devices that have the same software owner ID can be bound,
to prevent traffic intended for one host arriving at another.

The INIT_HCA command length was expanded by 128 bytes. The command
length is provided as an input FW commands. Older FW does not have a
problem receiving this command in the new longer form.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c        | 10 +++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c      |  6 +++++-
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h |  2 +-
 include/linux/mlx5/device.h                         |  5 +++++
 include/linux/mlx5/mlx5_ifc.h                       |  5 ++++-
 5 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 5ef1b56b6a96..9d11e92fb541 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -195,12 +195,20 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 	return 0;
 }
 
-int mlx5_cmd_init_hca(struct mlx5_core_dev *dev)
+int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id)
 {
 	u32 out[MLX5_ST_SZ_DW(init_hca_out)] = {0};
 	u32 in[MLX5_ST_SZ_DW(init_hca_in)]   = {0};
+	int i;
 
 	MLX5_SET(init_hca_in, in, opcode, MLX5_CMD_OP_INIT_HCA);
+
+	if (MLX5_CAP_GEN(dev, sw_owner_id)) {
+		for (i = 0; i < 4; i++)
+			MLX5_ARRAY_SET(init_hca_in, in, sw_owner_id, i,
+				       sw_owner_id[i]);
+	}
+
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 1292aecb09f2..e382a3ca759e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -75,6 +75,8 @@ static unsigned int prof_sel = MLX5_DEFAULT_PROF;
 module_param_named(prof_sel, prof_sel, uint, 0444);
 MODULE_PARM_DESC(prof_sel, "profile selector. Valid range 0 - 2");
 
+static u32 sw_owner_id[4];
+
 enum {
 	MLX5_ATOMIC_REQ_MODE_BE = 0x0,
 	MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS = 0x1,
@@ -1055,7 +1057,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto reclaim_boot_pages;
 	}
 
-	err = mlx5_cmd_init_hca(dev);
+	err = mlx5_cmd_init_hca(dev, sw_owner_id);
 	if (err) {
 		dev_err(&pdev->dev, "init hca failed\n");
 		goto err_pagealloc_stop;
@@ -1577,6 +1579,8 @@ static int __init init(void)
 {
 	int err;
 
+	get_random_bytes(&sw_owner_id, sizeof(sw_owner_id));
+
 	mlx5_core_verify_params();
 	mlx5_register_debugfs();
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index ff4a0b889a6f..b05868728da7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -86,7 +86,7 @@ enum {
 
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
 int mlx5_query_board_id(struct mlx5_core_dev *dev);
-int mlx5_cmd_init_hca(struct mlx5_core_dev *dev);
+int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id);
 int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev);
 int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev);
 void mlx5_core_event(struct mlx5_core_dev *dev, enum mlx5_dev_event event,
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 9aee835b7393..e5258ee4e38b 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -79,6 +79,11 @@
 		     << __mlx5_dw_bit_off(typ, fld))); \
 } while (0)
 
+#define MLX5_ARRAY_SET(typ, p, fld, idx, v) do { \
+	BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 32); \
+	MLX5_SET(typ, p, fld[idx], v); \
+} while (0)
+
 #define MLX5_SET_TO_ONES(typ, p, fld) do { \
 	BUILD_BUG_ON(__mlx5_st_sz_bits(typ) % 32);             \
 	*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 38a7577a9ce7..b1c81d7a86cb 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1066,7 +1066,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_5f8[0x3];
 	u8         log_max_xrq[0x5];
 
-	u8         reserved_at_600[0x200];
+	u8         reserved_at_600[0x1e];
+	u8	   sw_owner_id;
+	u8	   reserved_at_61f[0x1e1];
 };
 
 enum mlx5_flow_destination_type {
@@ -5531,6 +5533,7 @@ struct mlx5_ifc_init_hca_in_bits {
 	u8         op_mod[0x10];
 
 	u8         reserved_at_40[0x40];
+	u8	   sw_owner_id[4][0x20];
 };
 
 struct mlx5_ifc_init2rtr_qp_out_bits {
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 03/14] IB/core: Change roce_rescan_device to return void
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2018-01-04 15:25   ` [PATCH rdma-next v2 01/14] net/mlx5: Fix race for multiple RoCE enable Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 02/14] net/mlx5: Set software owner ID during init HCA Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 04/14] IB/mlx5: Reduce the use of num_port capability Leon Romanovsky
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

It always returns 0. Change return type to void.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/cache.c         | 7 +------
 drivers/infiniband/core/core_priv.h     | 2 +-
 drivers/infiniband/core/roce_gid_mgmt.c | 4 +---
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 7babdbceb6d0..fc4022884dbb 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -821,12 +821,7 @@ static int gid_table_setup_one(struct ib_device *ib_dev)
 	if (err)
 		return err;
 
-	err = roce_rescan_device(ib_dev);
-
-	if (err) {
-		gid_table_cleanup_one(ib_dev);
-		gid_table_release_one(ib_dev);
-	}
+	roce_rescan_device(ib_dev);
 
 	return err;
 }
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 0b62ff4bbb25..6b58267b64c6 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -138,7 +138,7 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
 int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
 
-int roce_rescan_device(struct ib_device *ib_dev);
+void roce_rescan_device(struct ib_device *ib_dev);
 unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
 
 int ib_cache_setup_one(struct ib_device *device);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 90e3889b7fbe..ebfe45739ca7 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -412,12 +412,10 @@ static void enum_all_gids_of_dev_cb(struct ib_device *ib_dev,
 
 /* This function will rescan all of the network devices in the system
  * and add their gids, as needed, to the relevant RoCE devices. */
-int roce_rescan_device(struct ib_device *ib_dev)
+void roce_rescan_device(struct ib_device *ib_dev)
 {
 	ib_enum_roce_netdev(ib_dev, pass_all_filter, NULL,
 			    enum_all_gids_of_dev_cb, NULL);
-
-	return 0;
 }
 
 static void callback_for_addr_gid_device_scan(struct ib_device *device,
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 04/14] IB/mlx5: Reduce the use of num_port capability
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 03/14] IB/core: Change roce_rescan_device to return void Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 05/14] IB/mlx5: Make netdev notifications multiport capable Leon Romanovsky
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Remove use of the num_ports general capability throughout. The number of
ports will be variable in the future, and reported in a different way.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mad.c  |  2 +-
 drivers/infiniband/hw/mlx5/main.c | 16 ++++++++--------
 drivers/infiniband/hw/mlx5/qp.c   |  5 ++---
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 1003b0133a49..0559e0a9e398 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -519,7 +519,7 @@ int mlx5_query_mad_ifc_port(struct ib_device *ibdev, u8 port,
 	int ext_active_speed;
 	int err = -ENOMEM;
 
-	if (port < 1 || port > MLX5_CAP_GEN(mdev, num_ports)) {
+	if (port < 1 || port > dev->num_ports) {
 		mlx5_ib_warn(dev, "invalid port number %d\n", port);
 		return -EINVAL;
 	}
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 675144a20f95..0c24cd42b411 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1512,7 +1512,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 	mutex_init(&context->db_page_mutex);
 
 	resp.tot_bfregs = req.total_num_bfregs;
-	resp.num_ports = MLX5_CAP_GEN(dev->mdev, num_ports);
+	resp.num_ports = dev->num_ports;
 
 	if (field_avail(typeof(resp), cqe_version, udata->outlen))
 		resp.response_length += sizeof(resp.cqe_version);
@@ -2774,7 +2774,7 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
 		return ERR_PTR(-ENOMEM);
 
 	if (domain != IB_FLOW_DOMAIN_USER ||
-	    flow_attr->port > MLX5_CAP_GEN(dev->mdev, num_ports) ||
+	    flow_attr->port > dev->num_ports ||
 	    (flow_attr->flags & ~IB_FLOW_ATTR_FLAGS_DONT_TRAP))
 		return ERR_PTR(-EINVAL);
 
@@ -3122,7 +3122,7 @@ static int set_has_smi_cap(struct mlx5_ib_dev *dev)
 	int err;
 	int port;
 
-	for (port = 1; port <= MLX5_CAP_GEN(dev->mdev, num_ports); port++) {
+	for (port = 1; port <= dev->num_ports; port++) {
 		dev->mdev->port_caps[port - 1].has_smi = false;
 		if (MLX5_CAP_GEN(dev->mdev, port_type) ==
 		    MLX5_CAP_PORT_TYPE_IB) {
@@ -3149,7 +3149,7 @@ static void get_ext_port_caps(struct mlx5_ib_dev *dev)
 {
 	int port;
 
-	for (port = 1; port <= MLX5_CAP_GEN(dev->mdev, num_ports); port++)
+	for (port = 1; port <= dev->num_ports; port++)
 		mlx5_query_ext_port_caps(dev, port);
 }
 
@@ -3179,7 +3179,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
 		goto out;
 	}
 
-	for (port = 1; port <= MLX5_CAP_GEN(dev->mdev, num_ports); port++) {
+	for (port = 1; port <= dev->num_ports; port++) {
 		memset(pprops, 0, sizeof(*pprops));
 		err = mlx5_ib_query_port(&dev->ib_dev, port, pprops);
 		if (err) {
@@ -4061,7 +4061,7 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	const char *name;
 	int err;
 
-	dev->port = kcalloc(MLX5_CAP_GEN(mdev, num_ports), sizeof(*dev->port),
+	dev->port = kcalloc(dev->num_ports, sizeof(*dev->port),
 			    GFP_KERNEL);
 	if (!dev->port)
 		return -ENOMEM;
@@ -4083,8 +4083,7 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	dev->ib_dev.owner		= THIS_MODULE;
 	dev->ib_dev.node_type		= RDMA_NODE_IB_CA;
 	dev->ib_dev.local_dma_lkey	= 0 /* not supported for now */;
-	dev->num_ports		= MLX5_CAP_GEN(mdev, num_ports);
-	dev->ib_dev.phys_port_cnt     = dev->num_ports;
+	dev->ib_dev.phys_port_cnt	= dev->num_ports;
 	dev->ib_dev.num_comp_vectors    =
 		dev->mdev->priv.eq_table.num_comp_vectors;
 	dev->ib_dev.dev.parent		= &mdev->pdev->dev;
@@ -4443,6 +4442,7 @@ static void *__mlx5_ib_add(struct mlx5_core_dev *mdev,
 		return NULL;
 
 	dev->mdev = mdev;
+	dev->num_ports = MLX5_CAP_GEN(mdev, num_ports);
 
 	for (i = 0; i < MLX5_IB_STAGE_MAX; i++) {
 		if (profile->stage[i].init) {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index f59de13b657e..4b08472cf0ba 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3351,7 +3351,7 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 
 	if ((attr_mask & IB_QP_PORT) &&
 	    (attr->port_num == 0 ||
-	     attr->port_num > MLX5_CAP_GEN(dev->mdev, num_ports))) {
+	     attr->port_num > dev->num_ports)) {
 		mlx5_ib_dbg(dev, "invalid port number %d. number of ports is %d\n",
 			    attr->port_num, dev->num_ports);
 		goto out;
@@ -4670,14 +4670,13 @@ static void to_rdma_ah_attr(struct mlx5_ib_dev *ibdev,
 			    struct rdma_ah_attr *ah_attr,
 			    struct mlx5_qp_path *path)
 {
-	struct mlx5_core_dev *dev = ibdev->mdev;
 
 	memset(ah_attr, 0, sizeof(*ah_attr));
 
 	ah_attr->type = rdma_ah_find_type(&ibdev->ib_dev, path->port);
 	rdma_ah_set_port_num(ah_attr, path->port);
 	if (rdma_ah_get_port_num(ah_attr) == 0 ||
-	    rdma_ah_get_port_num(ah_attr) > MLX5_CAP_GEN(dev, num_ports))
+	    rdma_ah_get_port_num(ah_attr) > ibdev->num_ports)
 		return;
 
 	rdma_ah_set_port_num(ah_attr, path->port);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 05/14] IB/mlx5: Make netdev notifications multiport capable
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 04/14] IB/mlx5: Reduce the use of num_port capability Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 06/14] {net,IB}/mlx5: Manage port association for multiport RoCE Leon Romanovsky
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When multiple RoCE ports are supported registration for events on
multiple netdevs is required. Refactor the event registration and
handling to support multiple ports.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 85 +++++++++++++++++++++---------------
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  4 +-
 drivers/infiniband/hw/mlx5/qp.c      |  3 +-
 include/linux/mlx5/driver.h          |  5 +++
 4 files changed, 60 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 0c24cd42b411..5fcb2ed94c11 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -113,24 +113,30 @@ static int get_port_state(struct ib_device *ibdev,
 static int mlx5_netdev_event(struct notifier_block *this,
 			     unsigned long event, void *ptr)
 {
+	struct mlx5_roce *roce = container_of(this, struct mlx5_roce, nb);
 	struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
-	struct mlx5_ib_dev *ibdev = container_of(this, struct mlx5_ib_dev,
-						 roce.nb);
+	u8 port_num = roce->native_port_num;
+	struct mlx5_core_dev *mdev;
+	struct mlx5_ib_dev *ibdev;
+
+	ibdev = roce->dev;
+	mdev = ibdev->mdev;
 
 	switch (event) {
 	case NETDEV_REGISTER:
 	case NETDEV_UNREGISTER:
-		write_lock(&ibdev->roce.netdev_lock);
-		if (ndev->dev.parent == &ibdev->mdev->pdev->dev)
-			ibdev->roce.netdev = (event == NETDEV_UNREGISTER) ?
-					     NULL : ndev;
-		write_unlock(&ibdev->roce.netdev_lock);
+		write_lock(&roce->netdev_lock);
+
+		if (ndev->dev.parent == &mdev->pdev->dev)
+			roce->netdev = (event == NETDEV_UNREGISTER) ?
+					NULL : ndev;
+		write_unlock(&roce->netdev_lock);
 		break;
 
 	case NETDEV_CHANGE:
 	case NETDEV_UP:
 	case NETDEV_DOWN: {
-		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(ibdev->mdev);
+		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
 		struct net_device *upper = NULL;
 
 		if (lag_ndev) {
@@ -138,27 +144,28 @@ static int mlx5_netdev_event(struct notifier_block *this,
 			dev_put(lag_ndev);
 		}
 
-		if ((upper == ndev || (!upper && ndev == ibdev->roce.netdev))
+		if ((upper == ndev || (!upper && ndev == roce->netdev))
 		    && ibdev->ib_active) {
 			struct ib_event ibev = { };
 			enum ib_port_state port_state;
 
-			if (get_port_state(&ibdev->ib_dev, 1, &port_state))
-				return NOTIFY_DONE;
+			if (get_port_state(&ibdev->ib_dev, port_num,
+					   &port_state))
+				goto done;
 
-			if (ibdev->roce.last_port_state == port_state)
-				return NOTIFY_DONE;
+			if (roce->last_port_state == port_state)
+				goto done;
 
-			ibdev->roce.last_port_state = port_state;
+			roce->last_port_state = port_state;
 			ibev.device = &ibdev->ib_dev;
 			if (port_state == IB_PORT_DOWN)
 				ibev.event = IB_EVENT_PORT_ERR;
 			else if (port_state == IB_PORT_ACTIVE)
 				ibev.event = IB_EVENT_PORT_ACTIVE;
 			else
-				return NOTIFY_DONE;
+				goto done;
 
-			ibev.element.port_num = 1;
+			ibev.element.port_num = port_num;
 			ib_dispatch_event(&ibev);
 		}
 		break;
@@ -167,7 +174,7 @@ static int mlx5_netdev_event(struct notifier_block *this,
 	default:
 		break;
 	}
-
+done:
 	return NOTIFY_DONE;
 }
 
@@ -183,11 +190,11 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 
 	/* Ensure ndev does not disappear before we invoke dev_hold()
 	 */
-	read_lock(&ibdev->roce.netdev_lock);
-	ndev = ibdev->roce.netdev;
+	read_lock(&ibdev->roce[port_num - 1].netdev_lock);
+	ndev = ibdev->roce[port_num - 1].netdev;
 	if (ndev)
 		dev_hold(ndev);
-	read_unlock(&ibdev->roce.netdev_lock);
+	read_unlock(&ibdev->roce[port_num - 1].netdev_lock);
 
 	return ndev;
 }
@@ -3579,33 +3586,33 @@ static void mlx5_eth_lag_cleanup(struct mlx5_ib_dev *dev)
 	}
 }
 
-static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev)
+static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	int err;
 
-	dev->roce.nb.notifier_call = mlx5_netdev_event;
-	err = register_netdevice_notifier(&dev->roce.nb);
+	dev->roce[port_num].nb.notifier_call = mlx5_netdev_event;
+	err = register_netdevice_notifier(&dev->roce[port_num].nb);
 	if (err) {
-		dev->roce.nb.notifier_call = NULL;
+		dev->roce[port_num].nb.notifier_call = NULL;
 		return err;
 	}
 
 	return 0;
 }
 
-static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev)
+static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 {
-	if (dev->roce.nb.notifier_call) {
-		unregister_netdevice_notifier(&dev->roce.nb);
-		dev->roce.nb.notifier_call = NULL;
+	if (dev->roce[port_num].nb.notifier_call) {
+		unregister_netdevice_notifier(&dev->roce[port_num].nb);
+		dev->roce[port_num].nb.notifier_call = NULL;
 	}
 }
 
-static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
+static int mlx5_enable_eth(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	int err;
 
-	err = mlx5_add_netdev_notifier(dev);
+	err = mlx5_add_netdev_notifier(dev, port_num);
 	if (err)
 		return err;
 
@@ -3626,7 +3633,7 @@ static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
 		mlx5_nic_vport_disable_roce(dev->mdev);
 
 err_unregister_netdevice_notifier:
-	mlx5_remove_netdev_notifier(dev);
+	mlx5_remove_netdev_notifier(dev, port_num);
 	return err;
 }
 
@@ -4066,7 +4073,6 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	if (!dev->port)
 		return -ENOMEM;
 
-	rwlock_init(&dev->roce.netdev_lock);
 	err = get_port_caps(dev);
 	if (err)
 		goto err_free_port;
@@ -4246,12 +4252,21 @@ static int mlx5_ib_stage_roce_init(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
+	u8 port_num = 0;
 	int err;
+	int i;
 
 	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
 	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
 	if (ll == IB_LINK_LAYER_ETHERNET) {
+		for (i = 0; i < dev->num_ports; i++) {
+			rwlock_init(&dev->roce[i].netdev_lock);
+			dev->roce[i].dev = dev;
+			dev->roce[i].native_port_num = i + 1;
+			dev->roce[i].last_port_state = IB_PORT_DOWN;
+		}
+
 		dev->ib_dev.get_netdev	= mlx5_ib_get_netdev;
 		dev->ib_dev.create_wq	 = mlx5_ib_create_wq;
 		dev->ib_dev.modify_wq	 = mlx5_ib_modify_wq;
@@ -4264,10 +4279,9 @@ static int mlx5_ib_stage_roce_init(struct mlx5_ib_dev *dev)
 			(1ull << IB_USER_VERBS_EX_CMD_DESTROY_WQ) |
 			(1ull << IB_USER_VERBS_EX_CMD_CREATE_RWQ_IND_TBL) |
 			(1ull << IB_USER_VERBS_EX_CMD_DESTROY_RWQ_IND_TBL);
-		err = mlx5_enable_eth(dev);
+		err = mlx5_enable_eth(dev, port_num);
 		if (err)
 			return err;
-		dev->roce.last_port_state = IB_PORT_DOWN;
 	}
 
 	return 0;
@@ -4278,13 +4292,14 @@ static void mlx5_ib_stage_roce_cleanup(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
+	u8 port_num = 0;
 
 	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
 	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
 	if (ll == IB_LINK_LAYER_ETHERNET) {
 		mlx5_disable_eth(dev);
-		mlx5_remove_netdev_notifier(dev);
+		mlx5_remove_netdev_notifier(dev, port_num);
 	}
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 57405cf8a5c5..6106dde35144 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -667,6 +667,8 @@ struct mlx5_roce {
 	struct notifier_block	nb;
 	atomic_t		next_port;
 	enum ib_port_state last_port_state;
+	struct mlx5_ib_dev	*dev;
+	u8			native_port_num;
 };
 
 struct mlx5_ib_dbg_param {
@@ -757,7 +759,7 @@ struct mlx5_ib_profile {
 struct mlx5_ib_dev {
 	struct ib_device		ib_dev;
 	struct mlx5_core_dev		*mdev;
-	struct mlx5_roce		roce;
+	struct mlx5_roce		roce[MLX5_MAX_PORTS];
 	int				num_ports;
 	/* serialize update of capability mask
 	 */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 4b08472cf0ba..ae36db3d0deb 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2962,8 +2962,9 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 		    (ibqp->qp_type == IB_QPT_XRC_INI) ||
 		    (ibqp->qp_type == IB_QPT_XRC_TGT)) {
 			if (mlx5_lag_is_active(dev->mdev)) {
+				u8 p = mlx5_core_native_port_num(dev->mdev);
 				tx_affinity = (unsigned int)atomic_add_return(1,
-						&dev->roce.next_port) %
+						&dev->roce[p].next_port) %
 						MLX5_MAX_PORTS + 1;
 				context->flags |= cpu_to_be32(tx_affinity << 24);
 			}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 5b0443c9d337..28733529f6ff 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1234,6 +1234,11 @@ static inline bool mlx5_rl_is_supported(struct mlx5_core_dev *dev)
 	return !!(dev->priv.rl_table.max_size);
 }
 
+static inline int mlx5_core_native_port_num(struct mlx5_core_dev *dev)
+{
+	return 1;
+}
+
 enum {
 	MLX5_TRIGGERED_CMD_COMP = (u64)1 << 32,
 };
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 06/14] {net,IB}/mlx5: Manage port association for multiport RoCE
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 05/14] IB/mlx5: Make netdev notifications multiport capable Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
       [not found]     ` <20180104152544.28919-7-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2018-01-04 15:25   ` [PATCH rdma-next v2 07/14] IB/mlx5: Move IB event processing onto a workqueue Leon Romanovsky
                     ` (9 subsequent siblings)
  15 siblings, 1 reply; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/cache.c                    |   2 +-
 drivers/infiniband/core/core_priv.h                |   1 -
 drivers/infiniband/core/roce_gid_mgmt.c            |  11 +-
 drivers/infiniband/hw/mlx5/main.c                  | 421 +++++++++++++++++++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |  40 ++
 .../net/ethernet/mellanox/mlx5/core/fpga/conn.c    |   4 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |  58 +++
 include/linux/mlx5/driver.h                        |  22 +-
 include/linux/mlx5/mlx5_ifc.h                      |  31 +-
 include/linux/mlx5/vport.h                         |   4 +
 include/rdma/ib_verbs.h                            |   8 +
 12 files changed, 562 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index fc4022884dbb..e9a409d7f4e2 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -821,7 +821,7 @@ static int gid_table_setup_one(struct ib_device *ib_dev)
 	if (err)
 		return err;
 
-	roce_rescan_device(ib_dev);
+	rdma_roce_rescan_device(ib_dev);
 
 	return err;
 }
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 6b58267b64c6..833ea36e823b 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -138,7 +138,6 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
 int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
 
-void roce_rescan_device(struct ib_device *ib_dev);
 unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
 
 int ib_cache_setup_one(struct ib_device *device);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index ebfe45739ca7..5a52ec77940a 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -410,13 +410,18 @@ static void enum_all_gids_of_dev_cb(struct ib_device *ib_dev,
 	rtnl_unlock();
 }
 
-/* This function will rescan all of the network devices in the system
- * and add their gids, as needed, to the relevant RoCE devices. */
-void roce_rescan_device(struct ib_device *ib_dev)
+/**
+ * rdma_roce_rescan_device - Rescan all of the network devices in the system
+ * and add their gids, as needed, to the relevant RoCE devices.
+ *
+ * @device:         the rdma device
+ */
+void rdma_roce_rescan_device(struct ib_device *ib_dev)
 {
 	ib_enum_roce_netdev(ib_dev, pass_all_filter, NULL,
 			    enum_all_gids_of_dev_cb, NULL);
 }
+EXPORT_SYMBOL(rdma_roce_rescan_device);
 
 static void callback_for_addr_gid_device_scan(struct ib_device *device,
 					      u8 port,
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 5fcb2ed94c11..4fbbe4c7a99b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -74,6 +74,23 @@ enum {
 	MLX5_ATOMIC_SIZE_QP_8BYTES = 1 << 3,
 };
 
+static LIST_HEAD(mlx5_ib_unaffiliated_port_list);
+static LIST_HEAD(mlx5_ib_dev_list);
+/*
+ * This mutex should be held when accessing either of the above lists
+ */
+static DEFINE_MUTEX(mlx5_ib_multiport_mutex);
+
+struct mlx5_ib_dev *mlx5_ib_get_ibdev_from_mpi(struct mlx5_ib_multiport_info *mpi)
+{
+	struct mlx5_ib_dev *dev;
+
+	mutex_lock(&mlx5_ib_multiport_mutex);
+	dev = mpi->ibdev;
+	mutex_unlock(&mlx5_ib_multiport_mutex);
+	return dev;
+}
+
 static enum rdma_link_layer
 mlx5_port_type_cap_to_rdma_ll(int port_type_cap)
 {
@@ -120,7 +137,9 @@ static int mlx5_netdev_event(struct notifier_block *this,
 	struct mlx5_ib_dev *ibdev;
 
 	ibdev = roce->dev;
-	mdev = ibdev->mdev;
+	mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL);
+	if (!mdev)
+		return NOTIFY_DONE;
 
 	switch (event) {
 	case NETDEV_REGISTER:
@@ -175,6 +194,7 @@ static int mlx5_netdev_event(struct notifier_block *this,
 		break;
 	}
 done:
+	mlx5_ib_put_native_port_mdev(ibdev, port_num);
 	return NOTIFY_DONE;
 }
 
@@ -183,10 +203,15 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 {
 	struct mlx5_ib_dev *ibdev = to_mdev(device);
 	struct net_device *ndev;
+	struct mlx5_core_dev *mdev;
+
+	mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL);
+	if (!mdev)
+		return NULL;
 
-	ndev = mlx5_lag_get_roce_netdev(ibdev->mdev);
+	ndev = mlx5_lag_get_roce_netdev(mdev);
 	if (ndev)
-		return ndev;
+		goto out;
 
 	/* Ensure ndev does not disappear before we invoke dev_hold()
 	 */
@@ -196,9 +221,70 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 		dev_hold(ndev);
 	read_unlock(&ibdev->roce[port_num - 1].netdev_lock);
 
+out:
+	mlx5_ib_put_native_port_mdev(ibdev, port_num);
 	return ndev;
 }
 
+struct mlx5_core_dev *mlx5_ib_get_native_port_mdev(struct mlx5_ib_dev *ibdev,
+						   u8 ib_port_num,
+						   u8 *native_port_num)
+{
+	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&ibdev->ib_dev,
+							  ib_port_num);
+	struct mlx5_core_dev *mdev = NULL;
+	struct mlx5_ib_multiport_info *mpi;
+	struct mlx5_ib_port *port;
+
+	if (native_port_num)
+		*native_port_num = 1;
+
+	if (!mlx5_core_mp_enabled(ibdev->mdev) || ll != IB_LINK_LAYER_ETHERNET)
+		return ibdev->mdev;
+
+	port = &ibdev->port[ib_port_num - 1];
+	if (!port)
+		return NULL;
+
+	spin_lock(&port->mp.mpi_lock);
+	mpi = ibdev->port[ib_port_num - 1].mp.mpi;
+	if (mpi && !mpi->unaffiliate) {
+		mdev = mpi->mdev;
+		/* If it's the master no need to refcount, it'll exist
+		 * as long as the ib_dev exists.
+		 */
+		if (!mpi->is_master)
+			mpi->mdev_refcnt++;
+	}
+	spin_unlock(&port->mp.mpi_lock);
+
+	return mdev;
+}
+
+void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *ibdev, u8 port_num)
+{
+	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&ibdev->ib_dev,
+							  port_num);
+	struct mlx5_ib_multiport_info *mpi;
+	struct mlx5_ib_port *port;
+
+	if (!mlx5_core_mp_enabled(ibdev->mdev) || ll != IB_LINK_LAYER_ETHERNET)
+		return;
+
+	port = &ibdev->port[port_num - 1];
+
+	spin_lock(&port->mp.mpi_lock);
+	mpi = ibdev->port[port_num - 1].mp.mpi;
+	if (mpi->is_master)
+		goto out;
+
+	mpi->mdev_refcnt--;
+	if (mpi->unaffiliate)
+		complete(&mpi->unref_comp);
+out:
+	spin_unlock(&port->mp.mpi_lock);
+}
+
 static int translate_eth_proto_oper(u32 eth_proto_oper, u8 *active_speed,
 				    u8 *active_width)
 {
@@ -3160,12 +3246,11 @@ static void get_ext_port_caps(struct mlx5_ib_dev *dev)
 		mlx5_query_ext_port_caps(dev, port);
 }
 
-static int get_port_caps(struct mlx5_ib_dev *dev)
+static int get_port_caps(struct mlx5_ib_dev *dev, u8 port)
 {
 	struct ib_device_attr *dprops = NULL;
 	struct ib_port_attr *pprops = NULL;
 	int err = -ENOMEM;
-	int port;
 	struct ib_udata uhw = {.inlen = 0, .outlen = 0};
 
 	pprops = kmalloc(sizeof(*pprops), GFP_KERNEL);
@@ -3186,22 +3271,21 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
 		goto out;
 	}
 
-	for (port = 1; port <= dev->num_ports; port++) {
-		memset(pprops, 0, sizeof(*pprops));
-		err = mlx5_ib_query_port(&dev->ib_dev, port, pprops);
-		if (err) {
-			mlx5_ib_warn(dev, "query_port %d failed %d\n",
-				     port, err);
-			break;
-		}
-		dev->mdev->port_caps[port - 1].pkey_table_len =
-						dprops->max_pkeys;
-		dev->mdev->port_caps[port - 1].gid_table_len =
-						pprops->gid_tbl_len;
-		mlx5_ib_dbg(dev, "pkey_table_len %d, gid_table_len %d\n",
-			    dprops->max_pkeys, pprops->gid_tbl_len);
+	memset(pprops, 0, sizeof(*pprops));
+	err = mlx5_ib_query_port(&dev->ib_dev, port, pprops);
+	if (err) {
+		mlx5_ib_warn(dev, "query_port %d failed %d\n",
+			     port, err);
+		goto out;
 	}
 
+	dev->mdev->port_caps[port - 1].pkey_table_len =
+					dprops->max_pkeys;
+	dev->mdev->port_caps[port - 1].gid_table_len =
+					pprops->gid_tbl_len;
+	mlx5_ib_dbg(dev, "port %d: pkey_table_len %d, gid_table_len %d\n",
+		    port, dprops->max_pkeys, pprops->gid_tbl_len);
+
 out:
 	kfree(pprops);
 	kfree(dprops);
@@ -4054,8 +4138,203 @@ mlx5_ib_get_vector_affinity(struct ib_device *ibdev, int comp_vector)
 	return mlx5_get_vector_affinity(dev->mdev, comp_vector);
 }
 
+/* The mlx5_ib_multiport_mutex should be held when calling this function */
+static void mlx5_ib_unbind_slave_port(struct mlx5_ib_dev *ibdev,
+				      struct mlx5_ib_multiport_info *mpi)
+{
+	u8 port_num = mlx5_core_native_port_num(mpi->mdev) - 1;
+	struct mlx5_ib_port *port = &ibdev->port[port_num];
+	int comps;
+	int err;
+	int i;
+
+	spin_lock(&port->mp.mpi_lock);
+	if (!mpi->ibdev) {
+		spin_unlock(&port->mp.mpi_lock);
+		return;
+	}
+	mpi->ibdev = NULL;
+
+	spin_unlock(&port->mp.mpi_lock);
+	mlx5_remove_netdev_notifier(ibdev, port_num);
+	spin_lock(&port->mp.mpi_lock);
+
+	comps = mpi->mdev_refcnt;
+	if (comps) {
+		mpi->unaffiliate = true;
+		init_completion(&mpi->unref_comp);
+		spin_unlock(&port->mp.mpi_lock);
+
+		for (i = 0; i < comps; i++)
+			wait_for_completion(&mpi->unref_comp);
+
+		spin_lock(&port->mp.mpi_lock);
+		mpi->unaffiliate = false;
+	}
+
+	port->mp.mpi = NULL;
+
+	list_add_tail(&mpi->list, &mlx5_ib_unaffiliated_port_list);
+
+	spin_unlock(&port->mp.mpi_lock);
+
+	err = mlx5_nic_vport_unaffiliate_multiport(mpi->mdev);
+
+	mlx5_ib_dbg(ibdev, "unaffiliated port %d\n", port_num + 1);
+	/* Log an error, still needed to cleanup the pointers and add
+	 * it back to the list.
+	 */
+	if (err)
+		mlx5_ib_err(ibdev, "Failed to unaffiliate port %u\n",
+			    port_num + 1);
+
+	ibdev->roce[port_num].last_port_state = IB_PORT_DOWN;
+}
+
+/* The mlx5_ib_multiport_mutex should be held when calling this function */
+static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev,
+				    struct mlx5_ib_multiport_info *mpi)
+{
+	u8 port_num = mlx5_core_native_port_num(mpi->mdev) - 1;
+	int err;
+
+	spin_lock(&ibdev->port[port_num].mp.mpi_lock);
+	if (ibdev->port[port_num].mp.mpi) {
+		mlx5_ib_warn(ibdev, "port %d already affiliated.\n",
+			     port_num + 1);
+		spin_unlock(&ibdev->port[port_num].mp.mpi_lock);
+		return false;
+	}
+
+	ibdev->port[port_num].mp.mpi = mpi;
+	mpi->ibdev = ibdev;
+	spin_unlock(&ibdev->port[port_num].mp.mpi_lock);
+
+	err = mlx5_nic_vport_affiliate_multiport(ibdev->mdev, mpi->mdev);
+	if (err)
+		goto unbind;
+
+	err = get_port_caps(ibdev, mlx5_core_native_port_num(mpi->mdev));
+	if (err)
+		goto unbind;
+
+	err = mlx5_add_netdev_notifier(ibdev, port_num);
+	if (err) {
+		mlx5_ib_err(ibdev, "failed adding netdev notifier for port %u\n",
+			    port_num + 1);
+		goto unbind;
+	}
+
+	return true;
+
+unbind:
+	mlx5_ib_unbind_slave_port(ibdev, mpi);
+	return false;
+}
+
+static int mlx5_ib_init_multiport_master(struct mlx5_ib_dev *dev)
+{
+	int port_num = mlx5_core_native_port_num(dev->mdev) - 1;
+	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&dev->ib_dev,
+							  port_num + 1);
+	struct mlx5_ib_multiport_info *mpi;
+	int err;
+	int i;
+
+	if (!mlx5_core_is_mp_master(dev->mdev) || ll != IB_LINK_LAYER_ETHERNET)
+		return 0;
+
+	err = mlx5_query_nic_vport_system_image_guid(dev->mdev,
+						     &dev->sys_image_guid);
+	if (err)
+		return err;
+
+	err = mlx5_nic_vport_enable_roce(dev->mdev);
+	if (err)
+		return err;
+
+	mutex_lock(&mlx5_ib_multiport_mutex);
+	for (i = 0; i < dev->num_ports; i++) {
+		bool bound = false;
+
+		/* build a stub multiport info struct for the native port. */
+		if (i == port_num) {
+			mpi = kzalloc(sizeof(*mpi), GFP_KERNEL);
+			if (!mpi) {
+				mutex_unlock(&mlx5_ib_multiport_mutex);
+				mlx5_nic_vport_disable_roce(dev->mdev);
+				return -ENOMEM;
+			}
+
+			mpi->is_master = true;
+			mpi->mdev = dev->mdev;
+			mpi->sys_image_guid = dev->sys_image_guid;
+			dev->port[i].mp.mpi = mpi;
+			mpi->ibdev = dev;
+			mpi = NULL;
+			continue;
+		}
+
+		list_for_each_entry(mpi, &mlx5_ib_unaffiliated_port_list,
+				    list) {
+			if (dev->sys_image_guid == mpi->sys_image_guid &&
+			    (mlx5_core_native_port_num(mpi->mdev) - 1) == i) {
+				bound = mlx5_ib_bind_slave_port(dev, mpi);
+			}
+
+			if (bound) {
+				dev_dbg(&mpi->mdev->pdev->dev, "removing port from unaffiliated list.\n");
+				mlx5_ib_dbg(dev, "port %d bound\n", i + 1);
+				list_del(&mpi->list);
+				break;
+			}
+		}
+		if (!bound) {
+			get_port_caps(dev, i + 1);
+			mlx5_ib_dbg(dev, "no free port found for port %d\n",
+				    i + 1);
+		}
+	}
+
+	list_add_tail(&dev->ib_dev_list, &mlx5_ib_dev_list);
+	mutex_unlock(&mlx5_ib_multiport_mutex);
+	return err;
+}
+
+static void mlx5_ib_cleanup_multiport_master(struct mlx5_ib_dev *dev)
+{
+	int port_num = mlx5_core_native_port_num(dev->mdev) - 1;
+	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&dev->ib_dev,
+							  port_num + 1);
+	int i;
+
+	if (!mlx5_core_is_mp_master(dev->mdev) || ll != IB_LINK_LAYER_ETHERNET)
+		return;
+
+	mutex_lock(&mlx5_ib_multiport_mutex);
+	for (i = 0; i < dev->num_ports; i++) {
+		if (dev->port[i].mp.mpi) {
+			/* Destroy the native port stub */
+			if (i == port_num) {
+				kfree(dev->port[i].mp.mpi);
+				dev->port[i].mp.mpi = NULL;
+			} else {
+				mlx5_ib_dbg(dev, "unbinding port_num: %d\n", i + 1);
+				mlx5_ib_unbind_slave_port(dev, dev->port[i].mp.mpi);
+			}
+		}
+	}
+
+	mlx5_ib_dbg(dev, "removing from devlist\n");
+	list_del(&dev->ib_dev_list);
+	mutex_unlock(&mlx5_ib_multiport_mutex);
+
+	mlx5_nic_vport_disable_roce(dev->mdev);
+}
+
 static void mlx5_ib_stage_init_cleanup(struct mlx5_ib_dev *dev)
 {
+	mlx5_ib_cleanup_multiport_master(dev);
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	cleanup_srcu_struct(&dev->mr_srcu);
 #endif
@@ -4067,16 +4346,36 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	const char *name;
 	int err;
+	int i;
 
 	dev->port = kcalloc(dev->num_ports, sizeof(*dev->port),
 			    GFP_KERNEL);
 	if (!dev->port)
 		return -ENOMEM;
 
-	err = get_port_caps(dev);
+	for (i = 0; i < dev->num_ports; i++) {
+		spin_lock_init(&dev->port[i].mp.mpi_lock);
+		rwlock_init(&dev->roce[i].netdev_lock);
+	}
+
+	err = mlx5_ib_init_multiport_master(dev);
 	if (err)
 		goto err_free_port;
 
+	if (!mlx5_core_mp_enabled(mdev)) {
+		int i;
+
+		for (i = 1; i <= dev->num_ports; i++) {
+			err = get_port_caps(dev, i);
+			if (err)
+				break;
+		}
+	} else {
+		err = get_port_caps(dev, mlx5_core_native_port_num(mdev));
+	}
+	if (err)
+		goto err_mp;
+
 	if (mlx5_use_mad_ifc(dev))
 		get_ext_port_caps(dev);
 
@@ -4106,6 +4405,8 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 #endif
 
 	return 0;
+err_mp:
+	mlx5_ib_cleanup_multiport_master(dev);
 
 err_free_port:
 	kfree(dev->port);
@@ -4252,16 +4553,16 @@ static int mlx5_ib_stage_roce_init(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
-	u8 port_num = 0;
+	u8 port_num;
 	int err;
 	int i;
 
+	port_num = mlx5_core_native_port_num(dev->mdev) - 1;
 	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
 	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
 	if (ll == IB_LINK_LAYER_ETHERNET) {
 		for (i = 0; i < dev->num_ports; i++) {
-			rwlock_init(&dev->roce[i].netdev_lock);
 			dev->roce[i].dev = dev;
 			dev->roce[i].native_port_num = i + 1;
 			dev->roce[i].last_port_state = IB_PORT_DOWN;
@@ -4292,8 +4593,9 @@ static void mlx5_ib_stage_roce_cleanup(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
-	u8 port_num = 0;
+	u8 port_num;
 
+	port_num = mlx5_core_native_port_num(dev->mdev) - 1;
 	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
 	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
@@ -4443,6 +4745,8 @@ static void __mlx5_ib_remove(struct mlx5_ib_dev *dev,
 	ib_dealloc_device((struct ib_device *)dev);
 }
 
+static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num);
+
 static void *__mlx5_ib_add(struct mlx5_core_dev *mdev,
 			   const struct mlx5_ib_profile *profile)
 {
@@ -4457,7 +4761,8 @@ static void *__mlx5_ib_add(struct mlx5_core_dev *mdev,
 		return NULL;
 
 	dev->mdev = mdev;
-	dev->num_ports = MLX5_CAP_GEN(mdev, num_ports);
+	dev->num_ports = max(MLX5_CAP_GEN(mdev, num_ports),
+			     MLX5_CAP_GEN(mdev, num_vhca_ports));
 
 	for (i = 0; i < MLX5_IB_STAGE_MAX; i++) {
 		if (profile->stage[i].init) {
@@ -4520,15 +4825,81 @@ static const struct mlx5_ib_profile pf_profile = {
 		     NULL),
 };
 
+static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
+{
+	struct mlx5_ib_multiport_info *mpi;
+	struct mlx5_ib_dev *dev;
+	bool bound = false;
+	int err;
+
+	mpi = kzalloc(sizeof(*mpi), GFP_KERNEL);
+	if (!mpi)
+		return NULL;
+
+	mpi->mdev = mdev;
+
+	err = mlx5_query_nic_vport_system_image_guid(mdev,
+						     &mpi->sys_image_guid);
+	if (err) {
+		kfree(mpi);
+		return NULL;
+	}
+
+	mutex_lock(&mlx5_ib_multiport_mutex);
+	list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
+		if (dev->sys_image_guid == mpi->sys_image_guid)
+			bound = mlx5_ib_bind_slave_port(dev, mpi);
+
+		if (bound) {
+			rdma_roce_rescan_device(&dev->ib_dev);
+			break;
+		}
+	}
+
+	if (!bound) {
+		list_add_tail(&mpi->list, &mlx5_ib_unaffiliated_port_list);
+		dev_dbg(&mdev->pdev->dev, "no suitable IB device found to bind to, added to unaffiliated list.\n");
+	} else {
+		mlx5_ib_dbg(dev, "bound port %u\n", port_num + 1);
+	}
+	mutex_unlock(&mlx5_ib_multiport_mutex);
+
+	return mpi;
+}
+
 static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 {
+	enum rdma_link_layer ll;
+	int port_type_cap;
+
+	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
+	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
+
+	if (mlx5_core_is_mp_slave(mdev) && ll == IB_LINK_LAYER_ETHERNET) {
+		u8 port_num = mlx5_core_native_port_num(mdev) - 1;
+
+		return mlx5_ib_add_slave_port(mdev, port_num);
+	}
+
 	return __mlx5_ib_add(mdev, &pf_profile);
 }
 
 static void mlx5_ib_remove(struct mlx5_core_dev *mdev, void *context)
 {
-	struct mlx5_ib_dev *dev = context;
+	struct mlx5_ib_multiport_info *mpi;
+	struct mlx5_ib_dev *dev;
+
+	if (mlx5_core_is_mp_slave(mdev)) {
+		mpi = context;
+		mutex_lock(&mlx5_ib_multiport_mutex);
+		if (mpi->ibdev)
+			mlx5_ib_unbind_slave_port(mpi->ibdev, mpi);
+		list_del(&mpi->list);
+		mutex_unlock(&mlx5_ib_multiport_mutex);
+		return;
+	}
 
+	dev = context;
 	__mlx5_ib_remove(dev, dev->profile, MLX5_IB_STAGE_MAX);
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 6106dde35144..e1667e72ad49 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -654,8 +654,17 @@ struct mlx5_ib_counters {
 	u16 set_id;
 };
 
+struct mlx5_ib_multiport_info;
+
+struct mlx5_ib_multiport {
+	struct mlx5_ib_multiport_info *mpi;
+	/* To be held when accessing the multiport info */
+	spinlock_t mpi_lock;
+};
+
 struct mlx5_ib_port {
 	struct mlx5_ib_counters cnts;
+	struct mlx5_ib_multiport mp;
 };
 
 struct mlx5_roce {
@@ -756,6 +765,29 @@ struct mlx5_ib_profile {
 	struct mlx5_ib_stage stage[MLX5_IB_STAGE_MAX];
 };
 
+struct mlx5_ib_odp {
+	struct ib_odp_caps	caps;
+	u64			max_size;
+	/*
+	 * Sleepable RCU that prevents destruction of MRs while they are still
+	 * being used by a page fault handler.
+	 */
+	struct srcu_struct      mr_srcu;
+	u32			null_mkey;
+	void			(*sync)(struct mlx5_ib_dev *dev);
+};
+
+struct mlx5_ib_multiport_info {
+	struct list_head list;
+	struct mlx5_ib_dev *ibdev;
+	struct mlx5_core_dev *mdev;
+	struct completion unref_comp;
+	u64 sys_image_guid;
+	u32 mdev_refcnt;
+	bool is_master;
+	bool unaffiliate;
+};
+
 struct mlx5_ib_dev {
 	struct ib_device		ib_dev;
 	struct mlx5_core_dev		*mdev;
@@ -800,6 +832,8 @@ struct mlx5_ib_dev {
 	struct mutex		lb_mutex;
 	u32			user_td;
 	u8			umr_fence;
+	struct list_head	ib_dev_list;
+	u64			sys_image_guid;
 };
 
 static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
@@ -1071,6 +1105,12 @@ int mlx5_ib_generate_wc(struct ib_cq *ibcq, struct ib_wc *wc);
 
 void mlx5_ib_free_bfreg(struct mlx5_ib_dev *dev, struct mlx5_bfreg_info *bfregi,
 			int bfregn);
+struct mlx5_ib_dev *mlx5_ib_get_ibdev_from_mpi(struct mlx5_ib_multiport_info *mpi);
+struct mlx5_core_dev *mlx5_ib_get_native_port_mdev(struct mlx5_ib_dev *dev,
+						   u8 ib_port_num,
+						   u8 *native_port_num);
+void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *dev,
+				  u8 port_num);
 
 static inline void init_query_mad(struct ib_smp *mad)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index c4392f741c5f..c841b03c3e48 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -688,7 +688,7 @@ static inline int mlx5_fpga_conn_init_qp(struct mlx5_fpga_conn *conn)
 	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
 	MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
 	MLX5_SET(qpc, qpc, primary_address_path.pkey_index, MLX5_FPGA_PKEY_INDEX);
-	MLX5_SET(qpc, qpc, primary_address_path.port, MLX5_FPGA_PORT_NUM);
+	MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, MLX5_FPGA_PORT_NUM);
 	MLX5_SET(qpc, qpc, pd, conn->fdev->conn_res.pdn);
 	MLX5_SET(qpc, qpc, cqn_snd, conn->cq.mcq.cqn);
 	MLX5_SET(qpc, qpc, cqn_rcv, conn->cq.mcq.cqn);
@@ -727,7 +727,7 @@ static inline int mlx5_fpga_conn_rtr_qp(struct mlx5_fpga_conn *conn)
 	MLX5_SET(qpc, qpc, next_rcv_psn,
 		 MLX5_GET(fpga_qpc, conn->fpga_qpc, next_send_psn));
 	MLX5_SET(qpc, qpc, primary_address_path.pkey_index, MLX5_FPGA_PKEY_INDEX);
-	MLX5_SET(qpc, qpc, primary_address_path.port, MLX5_FPGA_PORT_NUM);
+	MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, MLX5_FPGA_PORT_NUM);
 	ether_addr_copy(MLX5_ADDR_OF(qpc, qpc, primary_address_path.rmac_47_32),
 			MLX5_ADDR_OF(fpga_qpc, conn->fpga_qpc, fpga_mac_47_32));
 	MLX5_SET(qpc, qpc, primary_address_path.udp_sport,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index d2a66dc4adc6..261b95d014a0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -187,7 +187,7 @@ int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp *qp
 		 MLX5_QP_ENHANCED_ULP_STATELESS_MODE);
 
 	addr_path = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
-	MLX5_SET(ads, addr_path, port, 1);
+	MLX5_SET(ads, addr_path, vhca_port_num, 1);
 	MLX5_SET(ads, addr_path, grh, 1);
 
 	ret = mlx5_core_create_qp(mdev, qp, in, inlen);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 916523103f16..9cb939b6a859 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -1121,3 +1121,61 @@ int mlx5_core_modify_hca_vport_context(struct mlx5_core_dev *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_core_modify_hca_vport_context);
+
+int mlx5_nic_vport_affiliate_multiport(struct mlx5_core_dev *master_mdev,
+				       struct mlx5_core_dev *port_mdev)
+{
+	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+	void *in;
+	int err;
+
+	in = kvzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	err = mlx5_nic_vport_enable_roce(port_mdev);
+	if (err)
+		goto free;
+
+	MLX5_SET(modify_nic_vport_context_in, in, field_select.affiliation, 1);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.affiliated_vhca_id,
+		 MLX5_CAP_GEN(master_mdev, vhca_id));
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.affiliation_criteria,
+		 MLX5_CAP_GEN(port_mdev, affiliate_nic_vport_criteria));
+
+	err = mlx5_modify_nic_vport_context(port_mdev, in, inlen);
+	if (err)
+		mlx5_nic_vport_disable_roce(port_mdev);
+
+free:
+	kvfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_nic_vport_affiliate_multiport);
+
+int mlx5_nic_vport_unaffiliate_multiport(struct mlx5_core_dev *port_mdev)
+{
+	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+	void *in;
+	int err;
+
+	in = kvzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_nic_vport_context_in, in, field_select.affiliation, 1);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.affiliated_vhca_id, 0);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.affiliation_criteria, 0);
+
+	err = mlx5_modify_nic_vport_context(port_mdev, in, inlen);
+	if (!err)
+		mlx5_nic_vport_disable_roce(port_mdev);
+
+	kvfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_nic_vport_unaffiliate_multiport);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 28733529f6ff..d5c787519e06 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1234,9 +1234,29 @@ static inline bool mlx5_rl_is_supported(struct mlx5_core_dev *dev)
 	return !!(dev->priv.rl_table.max_size);
 }
 
+static inline int mlx5_core_is_mp_slave(struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, affiliate_nic_vport_criteria) &&
+	       MLX5_CAP_GEN(dev, num_vhca_ports) <= 1;
+}
+
+static inline int mlx5_core_is_mp_master(struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, num_vhca_ports) > 1;
+}
+
+static inline int mlx5_core_mp_enabled(struct mlx5_core_dev *dev)
+{
+	return mlx5_core_is_mp_slave(dev) ||
+	       mlx5_core_is_mp_master(dev);
+}
+
 static inline int mlx5_core_native_port_num(struct mlx5_core_dev *dev)
 {
-	return 1;
+	if (!mlx5_core_mp_enabled(dev))
+		return 1;
+
+	return MLX5_CAP_GEN(dev, native_port_num);
 }
 
 enum {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index b1c81d7a86cb..7e88c8e7f374 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -502,7 +502,7 @@ struct mlx5_ifc_ads_bits {
 	u8         dei_cfi[0x1];
 	u8         eth_prio[0x3];
 	u8         sl[0x4];
-	u8         port[0x8];
+	u8         vhca_port_num[0x8];
 	u8         rmac_47_32[0x10];
 
 	u8         rmac_31_0[0x20];
@@ -794,7 +794,10 @@ enum {
 };
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8         reserved_at_0[0x80];
+	u8         reserved_at_0[0x30];
+	u8         vhca_id[0x10];
+
+	u8         reserved_at_40[0x40];
 
 	u8         log_max_srq_sz[0x8];
 	u8         log_max_qp_sz[0x8];
@@ -1066,8 +1069,11 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_5f8[0x3];
 	u8         log_max_xrq[0x5];
 
-	u8         reserved_at_600[0x1e];
-	u8	   sw_owner_id;
+	u8	   affiliate_nic_vport_criteria[0x8];
+	u8	   native_port_num[0x8];
+	u8	   num_vhca_ports[0x8];
+	u8	   reserved_at_618[0x6];
+	u8	   sw_owner_id[0x1];
 	u8	   reserved_at_61f[0x1e1];
 };
 
@@ -2617,7 +2623,12 @@ struct mlx5_ifc_nic_vport_context_bits {
 	u8         event_on_mc_address_change[0x1];
 	u8         event_on_uc_address_change[0x1];
 
-	u8         reserved_at_40[0xf0];
+	u8         reserved_at_40[0xc];
+
+	u8	   affiliation_criteria[0x4];
+	u8	   affiliated_vhca_id[0x10];
+
+	u8	   reserved_at_60[0xd0];
 
 	u8         mtu[0x10];
 
@@ -3260,7 +3271,8 @@ struct mlx5_ifc_set_roce_address_in_bits {
 	u8         op_mod[0x10];
 
 	u8         roce_address_index[0x10];
-	u8         reserved_at_50[0x10];
+	u8         reserved_at_50[0xc];
+	u8	   vhca_port_num[0x4];
 
 	u8         reserved_at_60[0x20];
 
@@ -3880,7 +3892,8 @@ struct mlx5_ifc_query_roce_address_in_bits {
 	u8         op_mod[0x10];
 
 	u8         roce_address_index[0x10];
-	u8         reserved_at_50[0x10];
+	u8         reserved_at_50[0xc];
+	u8	   vhca_port_num[0x4];
 
 	u8         reserved_at_60[0x20];
 };
@@ -5312,7 +5325,9 @@ struct mlx5_ifc_modify_nic_vport_context_out_bits {
 };
 
 struct mlx5_ifc_modify_nic_vport_field_select_bits {
-	u8         reserved_at_0[0x14];
+	u8         reserved_at_0[0x12];
+	u8	   affiliation[0x1];
+	u8	   reserved_at_e[0x1];
 	u8         disable_uc_local_lb[0x1];
 	u8         disable_mc_local_lb[0x1];
 	u8         node_guid[0x1];
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index aaa0bb9e7655..64e193e87394 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -116,4 +116,8 @@ int mlx5_core_modify_hca_vport_context(struct mlx5_core_dev *dev,
 				       struct mlx5_hca_vport_context *req);
 int mlx5_nic_vport_update_local_lb(struct mlx5_core_dev *mdev, bool enable);
 int mlx5_nic_vport_query_local_lb(struct mlx5_core_dev *mdev, bool *status);
+
+int mlx5_nic_vport_affiliate_multiport(struct mlx5_core_dev *master_mdev,
+				       struct mlx5_core_dev *port_mdev);
+int mlx5_nic_vport_unaffiliate_multiport(struct mlx5_core_dev *port_mdev);
 #endif /* __MLX5_VPORT_H__ */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0c8ec8c0211e..41d4f59cac5e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3871,4 +3871,12 @@ ib_get_vector_affinity(struct ib_device *device, int comp_vector)
 
 }
 
+/**
+ * rdma_roce_rescan_device - Rescan all of the network devices in the system
+ * and add their gids, as needed, to the relevant RoCE devices.
+ *
+ * @device:         the rdma device
+ */
+void rdma_roce_rescan_device(struct ib_device *ibdev);
+
 #endif /* IB_VERBS_H */
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 07/14] IB/mlx5: Move IB event processing onto a workqueue
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 06/14] {net,IB}/mlx5: Manage port association for multiport RoCE Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 08/14] IB/mlx5: Implement dual port functionality in query routines Leon Romanovsky
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Because mlx5_ib_event can be called from atomic context move event
handling onto a workqueue. A mutex lock is required to get the IB device
for slave ports, so move event processing onto a work queue. When an IB
event is received, check if the mlx5_core_dev  is a slave port, if so
attempt to get the IB device it's affiliated with. If found process the
event for that device, otherwise return.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 69 +++++++++++++++++++++++++++++++--------
 1 file changed, 56 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4fbbe4c7a99b..0ff6da1b885f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -70,10 +70,19 @@ static char mlx5_version[] =
 	DRIVER_NAME ": Mellanox Connect-IB Infiniband driver v"
 	DRIVER_VERSION "\n";
 
+struct mlx5_ib_event_work {
+	struct work_struct	work;
+	struct mlx5_core_dev	*dev;
+	void			*context;
+	enum mlx5_dev_event	event;
+	unsigned long		param;
+};
+
 enum {
 	MLX5_ATOMIC_SIZE_QP_8BYTES = 1 << 3,
 };
 
+static struct workqueue_struct *mlx5_ib_event_wq;
 static LIST_HEAD(mlx5_ib_unaffiliated_port_list);
 static LIST_HEAD(mlx5_ib_dev_list);
 /*
@@ -3132,15 +3141,24 @@ static void delay_drop_handler(struct work_struct *work)
 	mutex_unlock(&delay_drop->lock);
 }
 
-static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
-			  enum mlx5_dev_event event, unsigned long param)
+static void mlx5_ib_handle_event(struct work_struct *_work)
 {
-	struct mlx5_ib_dev *ibdev = (struct mlx5_ib_dev *)context;
+	struct mlx5_ib_event_work *work =
+		container_of(_work, struct mlx5_ib_event_work, work);
+	struct mlx5_ib_dev *ibdev;
 	struct ib_event ibev;
 	bool fatal = false;
 	u8 port = 0;
 
-	switch (event) {
+	if (mlx5_core_is_mp_slave(work->dev)) {
+		ibdev = mlx5_ib_get_ibdev_from_mpi(work->context);
+		if (!ibdev)
+			goto out;
+	} else {
+		ibdev = work->context;
+	}
+
+	switch (work->event) {
 	case MLX5_DEV_EVENT_SYS_ERROR:
 		ibev.event = IB_EVENT_DEVICE_FATAL;
 		mlx5_ib_handle_internal_error(ibdev);
@@ -3150,39 +3168,39 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
 	case MLX5_DEV_EVENT_PORT_UP:
 	case MLX5_DEV_EVENT_PORT_DOWN:
 	case MLX5_DEV_EVENT_PORT_INITIALIZED:
-		port = (u8)param;
+		port = (u8)work->param;
 
 		/* In RoCE, port up/down events are handled in
 		 * mlx5_netdev_event().
 		 */
 		if (mlx5_ib_port_link_layer(&ibdev->ib_dev, port) ==
 			IB_LINK_LAYER_ETHERNET)
-			return;
+			goto out;
 
-		ibev.event = (event == MLX5_DEV_EVENT_PORT_UP) ?
+		ibev.event = (work->event == MLX5_DEV_EVENT_PORT_UP) ?
 			     IB_EVENT_PORT_ACTIVE : IB_EVENT_PORT_ERR;
 		break;
 
 	case MLX5_DEV_EVENT_LID_CHANGE:
 		ibev.event = IB_EVENT_LID_CHANGE;
-		port = (u8)param;
+		port = (u8)work->param;
 		break;
 
 	case MLX5_DEV_EVENT_PKEY_CHANGE:
 		ibev.event = IB_EVENT_PKEY_CHANGE;
-		port = (u8)param;
+		port = (u8)work->param;
 
 		schedule_work(&ibdev->devr.ports[port - 1].pkey_change_work);
 		break;
 
 	case MLX5_DEV_EVENT_GUID_CHANGE:
 		ibev.event = IB_EVENT_GID_CHANGE;
-		port = (u8)param;
+		port = (u8)work->param;
 		break;
 
 	case MLX5_DEV_EVENT_CLIENT_REREG:
 		ibev.event = IB_EVENT_CLIENT_REREGISTER;
-		port = (u8)param;
+		port = (u8)work->param;
 		break;
 	case MLX5_DEV_EVENT_DELAY_DROP_TIMEOUT:
 		schedule_work(&ibdev->delay_drop.delay_drop_work);
@@ -3204,9 +3222,29 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
 
 	if (fatal)
 		ibdev->ib_active = false;
-
 out:
-	return;
+	kfree(work);
+}
+
+static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
+			  enum mlx5_dev_event event, unsigned long param)
+{
+	struct mlx5_ib_event_work *work;
+
+	work = kmalloc(sizeof(*work), GFP_ATOMIC);
+	if (work) {
+		INIT_WORK(&work->work, mlx5_ib_handle_event);
+		work->dev = dev;
+		work->param = param;
+		work->context = context;
+		work->event = event;
+
+		queue_work(mlx5_ib_event_wq, &work->work);
+		return;
+	}
+
+	dev_warn(&dev->pdev->dev, "%s: mlx5_dev_event: %d, with param: %lu dropped, couldn't allocate memory.\n",
+		 __func__, event, param);
 }
 
 static int set_has_smi_cap(struct mlx5_ib_dev *dev)
@@ -4917,6 +4955,10 @@ static int __init mlx5_ib_init(void)
 {
 	int err;
 
+	mlx5_ib_event_wq = alloc_ordered_workqueue("mlx5_ib_event_wq", 0);
+	if (!mlx5_ib_event_wq)
+		return -ENOMEM;
+
 	mlx5_ib_odp_init();
 
 	err = mlx5_register_interface(&mlx5_ib_interface);
@@ -4927,6 +4969,7 @@ static int __init mlx5_ib_init(void)
 static void __exit mlx5_ib_cleanup(void)
 {
 	mlx5_unregister_interface(&mlx5_ib_interface);
+	destroy_workqueue(mlx5_ib_event_wq);
 }
 
 module_init(mlx5_ib_init);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 08/14] IB/mlx5: Implement dual port functionality in query routines
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 07/14] IB/mlx5: Move IB event processing onto a workqueue Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 09/14] IB/mlx5: Change debugfs to have per port contents Leon Romanovsky
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Port operations must be routed to their native mlx5_core_dev. A
multiport RoCE device registers itself as having 2 ports even before a
2nd port is affiliated. If an unaffilated port is queried use capability
information from the master port, these values are the same.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 102 +++++++++++++++++++++++++++++++-------
 1 file changed, 85 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 0ff6da1b885f..59e26901d935 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -359,16 +359,30 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 	struct mlx5_core_dev *mdev = dev->mdev;
 	struct net_device *ndev, *upper;
 	enum ib_mtu ndev_ib_mtu;
+	bool put_mdev = true;
 	u16 qkey_viol_cntr;
 	u32 eth_prot_oper;
+	u8 mdev_port_num;
 	int err;
 
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num, &mdev_port_num);
+	if (!mdev) {
+		/* This means the port isn't affiliated yet. Get the
+		 * info for the master port instead.
+		 */
+		put_mdev = false;
+		mdev = dev->mdev;
+		mdev_port_num = 1;
+		port_num = 1;
+	}
+
 	/* Possible bad flows are checked before filling out props so in case
 	 * of an error it will still be zeroed out.
 	 */
-	err = mlx5_query_port_eth_proto_oper(mdev, &eth_prot_oper, port_num);
+	err = mlx5_query_port_eth_proto_oper(mdev, &eth_prot_oper,
+					     mdev_port_num);
 	if (err)
-		return err;
+		goto out;
 
 	translate_eth_proto_oper(eth_prot_oper, &props->active_speed,
 				 &props->active_width);
@@ -384,12 +398,16 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 	props->state            = IB_PORT_DOWN;
 	props->phys_state       = 3;
 
-	mlx5_query_nic_vport_qkey_viol_cntr(dev->mdev, &qkey_viol_cntr);
+	mlx5_query_nic_vport_qkey_viol_cntr(mdev, &qkey_viol_cntr);
 	props->qkey_viol_cntr = qkey_viol_cntr;
 
+	/* If this is a stub query for an unaffiliated port stop here */
+	if (!put_mdev)
+		goto out;
+
 	ndev = mlx5_ib_get_netdev(device, port_num);
 	if (!ndev)
-		return 0;
+		goto out;
 
 	if (mlx5_lag_is_active(dev->mdev)) {
 		rcu_read_lock();
@@ -412,7 +430,10 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 	dev_put(ndev);
 
 	props->active_mtu	= min(props->max_mtu, ndev_ib_mtu);
-	return 0;
+out:
+	if (put_mdev)
+		mlx5_ib_put_native_port_mdev(dev, port_num);
+	return err;
 }
 
 static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
@@ -1221,7 +1242,22 @@ int mlx5_ib_query_port(struct ib_device *ibdev, u8 port,
 	}
 
 	if (!ret && props) {
-		count = mlx5_core_reserved_gids_count(to_mdev(ibdev)->mdev);
+		struct mlx5_ib_dev *dev = to_mdev(ibdev);
+		struct mlx5_core_dev *mdev;
+		bool put_mdev = true;
+
+		mdev = mlx5_ib_get_native_port_mdev(dev, port, NULL);
+		if (!mdev) {
+			/* If the port isn't affiliated yet query the master.
+			 * The master and slave will have the same values.
+			 */
+			mdev = dev->mdev;
+			port = 1;
+			put_mdev = false;
+		}
+		count = mlx5_core_reserved_gids_count(mdev);
+		if (put_mdev)
+			mlx5_ib_put_native_port_mdev(dev, port);
 		props->gid_tbl_len -= count;
 	}
 	return ret;
@@ -1246,20 +1282,43 @@ static int mlx5_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
 
 }
 
-static int mlx5_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
-			      u16 *pkey)
+static int mlx5_query_hca_nic_pkey(struct ib_device *ibdev, u8 port,
+				   u16 index, u16 *pkey)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
-	struct mlx5_core_dev *mdev = dev->mdev;
+	struct mlx5_core_dev *mdev;
+	bool put_mdev = true;
+	u8 mdev_port_num;
+	int err;
 
+	mdev = mlx5_ib_get_native_port_mdev(dev, port, &mdev_port_num);
+	if (!mdev) {
+		/* The port isn't affiliated yet, get the PKey from the master
+		 * port. For RoCE the PKey tables will be the same.
+		 */
+		put_mdev = false;
+		mdev = dev->mdev;
+		mdev_port_num = 1;
+	}
+
+	err = mlx5_query_hca_vport_pkey(mdev, 0, mdev_port_num, 0,
+					index, pkey);
+	if (put_mdev)
+		mlx5_ib_put_native_port_mdev(dev, port);
+
+	return err;
+}
+
+static int mlx5_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
+			      u16 *pkey)
+{
 	switch (mlx5_get_vport_access_method(ibdev)) {
 	case MLX5_VPORT_ACCESS_METHOD_MAD:
 		return mlx5_query_mad_ifc_pkey(ibdev, port, index, pkey);
 
 	case MLX5_VPORT_ACCESS_METHOD_HCA:
 	case MLX5_VPORT_ACCESS_METHOD_NIC:
-		return mlx5_query_hca_vport_pkey(mdev, 0, port,  0, index,
-						 pkey);
+		return mlx5_query_hca_nic_pkey(ibdev, port, index, pkey);
 	default:
 		return -EINVAL;
 	}
@@ -1298,23 +1357,32 @@ static int set_port_caps_atomic(struct mlx5_ib_dev *dev, u8 port_num, u32 mask,
 				u32 value)
 {
 	struct mlx5_hca_vport_context ctx = {};
+	struct mlx5_core_dev *mdev;
+	u8 mdev_port_num;
 	int err;
 
-	err = mlx5_query_hca_vport_context(dev->mdev, 0,
-					   port_num, 0, &ctx);
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num, &mdev_port_num);
+	if (!mdev)
+		return -ENODEV;
+
+	err = mlx5_query_hca_vport_context(mdev, 0, mdev_port_num, 0, &ctx);
 	if (err)
-		return err;
+		goto out;
 
 	if (~ctx.cap_mask1_perm & mask) {
 		mlx5_ib_warn(dev, "trying to change bitmask 0x%X but change supported 0x%X\n",
 			     mask, ctx.cap_mask1_perm);
-		return -EINVAL;
+		err = -EINVAL;
+		goto out;
 	}
 
 	ctx.cap_mask1 = value;
 	ctx.cap_mask1_perm = mask;
-	err = mlx5_core_modify_hca_vport_context(dev->mdev, 0,
-						 port_num, 0, &ctx);
+	err = mlx5_core_modify_hca_vport_context(mdev, 0, mdev_port_num,
+						 0, &ctx);
+
+out:
+	mlx5_ib_put_native_port_mdev(dev, port_num);
 
 	return err;
 }
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 09/14] IB/mlx5: Change debugfs to have per port contents
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 08/14] IB/mlx5: Implement dual port functionality in query routines Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 10/14] IB/mlx5: Update counter implementation for dual port RoCE Leon Romanovsky
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When there are multiple ports for single IB(RoCE) device, support
debugfs entries to be available for each port.

Signed-off-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/cong.c    | 83 +++++++++++++++++++++++++-----------
 drivers/infiniband/hw/mlx5/main.c    | 12 +++++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  7 +--
 3 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cong.c b/drivers/infiniband/hw/mlx5/cong.c
index 2d32b519bb61..985fa2637390 100644
--- a/drivers/infiniband/hw/mlx5/cong.c
+++ b/drivers/infiniband/hw/mlx5/cong.c
@@ -247,21 +247,30 @@ static void mlx5_ib_set_cc_param_mask_val(void *field, int offset,
 	}
 }
 
-static int mlx5_ib_get_cc_params(struct mlx5_ib_dev *dev, int offset, u32 *var)
+static int mlx5_ib_get_cc_params(struct mlx5_ib_dev *dev, u8 port_num,
+				 int offset, u32 *var)
 {
 	int outlen = MLX5_ST_SZ_BYTES(query_cong_params_out);
 	void *out;
 	void *field;
 	int err;
 	enum mlx5_ib_cong_node_type node;
+	struct mlx5_core_dev *mdev;
+
+	/* Takes a 1-based port number */
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num + 1, NULL);
+	if (!mdev)
+		return -ENODEV;
 
 	out = kvzalloc(outlen, GFP_KERNEL);
-	if (!out)
-		return -ENOMEM;
+	if (!out) {
+		err = -ENOMEM;
+		goto alloc_err;
+	}
 
 	node = mlx5_ib_param_to_node(offset);
 
-	err = mlx5_cmd_query_cong_params(dev->mdev, node, out, outlen);
+	err = mlx5_cmd_query_cong_params(mdev, node, out, outlen);
 	if (err)
 		goto free;
 
@@ -270,21 +279,32 @@ static int mlx5_ib_get_cc_params(struct mlx5_ib_dev *dev, int offset, u32 *var)
 
 free:
 	kvfree(out);
+alloc_err:
+	mlx5_ib_put_native_port_mdev(dev, port_num + 1);
 	return err;
 }
 
-static int mlx5_ib_set_cc_params(struct mlx5_ib_dev *dev, int offset, u32 var)
+static int mlx5_ib_set_cc_params(struct mlx5_ib_dev *dev, u8 port_num,
+				 int offset, u32 var)
 {
 	int inlen = MLX5_ST_SZ_BYTES(modify_cong_params_in);
 	void *in;
 	void *field;
 	enum mlx5_ib_cong_node_type node;
+	struct mlx5_core_dev *mdev;
 	u32 attr_mask = 0;
 	int err;
 
+	/* Takes a 1-based port number */
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num + 1, NULL);
+	if (!mdev)
+		return -ENODEV;
+
 	in = kvzalloc(inlen, GFP_KERNEL);
-	if (!in)
-		return -ENOMEM;
+	if (!in) {
+		err = -ENOMEM;
+		goto alloc_err;
+	}
 
 	MLX5_SET(modify_cong_params_in, in, opcode,
 		 MLX5_CMD_OP_MODIFY_CONG_PARAMS);
@@ -299,8 +319,10 @@ static int mlx5_ib_set_cc_params(struct mlx5_ib_dev *dev, int offset, u32 var)
 	MLX5_SET(field_select_r_roce_rp, field, field_select_r_roce_rp,
 		 attr_mask);
 
-	err = mlx5_cmd_modify_cong_params(dev->mdev, in, inlen);
+	err = mlx5_cmd_modify_cong_params(mdev, in, inlen);
 	kvfree(in);
+alloc_err:
+	mlx5_ib_put_native_port_mdev(dev, port_num + 1);
 	return err;
 }
 
@@ -324,7 +346,7 @@ static ssize_t set_param(struct file *filp, const char __user *buf,
 	if (kstrtou32(lbuf, 0, &var))
 		return -EINVAL;
 
-	ret = mlx5_ib_set_cc_params(param->dev, offset, var);
+	ret = mlx5_ib_set_cc_params(param->dev, param->port_num, offset, var);
 	return ret ? ret : count;
 }
 
@@ -340,7 +362,7 @@ static ssize_t get_param(struct file *filp, char __user *buf, size_t count,
 	if (*pos)
 		return 0;
 
-	ret = mlx5_ib_get_cc_params(param->dev, offset, &var);
+	ret = mlx5_ib_get_cc_params(param->dev, param->port_num, offset, &var);
 	if (ret)
 		return ret;
 
@@ -362,44 +384,51 @@ static const struct file_operations dbg_cc_fops = {
 	.read	= get_param,
 };
 
-void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev)
+void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	if (!mlx5_debugfs_root ||
-	    !dev->dbg_cc_params ||
-	    !dev->dbg_cc_params->root)
+	    !dev->port[port_num].dbg_cc_params ||
+	    !dev->port[port_num].dbg_cc_params->root)
 		return;
 
-	debugfs_remove_recursive(dev->dbg_cc_params->root);
-	kfree(dev->dbg_cc_params);
-	dev->dbg_cc_params = NULL;
+	debugfs_remove_recursive(dev->port[port_num].dbg_cc_params->root);
+	kfree(dev->port[port_num].dbg_cc_params);
+	dev->port[port_num].dbg_cc_params = NULL;
 }
 
-int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev)
+int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	struct mlx5_ib_dbg_cc_params *dbg_cc_params;
+	struct mlx5_core_dev *mdev;
 	int i;
 
 	if (!mlx5_debugfs_root)
 		goto out;
 
-	if (!MLX5_CAP_GEN(dev->mdev, cc_query_allowed) ||
-	    !MLX5_CAP_GEN(dev->mdev, cc_modify_allowed))
+	/* Takes a 1-based port number */
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num + 1, NULL);
+	if (!mdev)
 		goto out;
 
+	if (!MLX5_CAP_GEN(mdev, cc_query_allowed) ||
+	    !MLX5_CAP_GEN(mdev, cc_modify_allowed))
+		goto put_mdev;
+
 	dbg_cc_params = kzalloc(sizeof(*dbg_cc_params), GFP_KERNEL);
 	if (!dbg_cc_params)
-		goto out;
+		goto err;
 
-	dev->dbg_cc_params = dbg_cc_params;
+	dev->port[port_num].dbg_cc_params = dbg_cc_params;
 
 	dbg_cc_params->root = debugfs_create_dir("cc_params",
-						 dev->mdev->priv.dbg_root);
+						 mdev->priv.dbg_root);
 	if (!dbg_cc_params->root)
 		goto err;
 
 	for (i = 0; i < MLX5_IB_DBG_CC_MAX; i++) {
 		dbg_cc_params->params[i].offset = i;
 		dbg_cc_params->params[i].dev = dev;
+		dbg_cc_params->params[i].port_num = port_num;
 		dbg_cc_params->params[i].dentry =
 			debugfs_create_file(mlx5_ib_dbg_cc_name[i],
 					    0600, dbg_cc_params->root,
@@ -408,11 +437,17 @@ int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev)
 		if (!dbg_cc_params->params[i].dentry)
 			goto err;
 	}
-out:	return 0;
+
+put_mdev:
+	mlx5_ib_put_native_port_mdev(dev, port_num + 1);
+out:
+	return 0;
 
 err:
 	mlx5_ib_warn(dev, "cong debugfs failure\n");
-	mlx5_ib_cleanup_cong_debugfs(dev);
+	mlx5_ib_cleanup_cong_debugfs(dev, port_num);
+	mlx5_ib_put_native_port_mdev(dev, port_num + 1);
+
 	/*
 	 * We don't want to fail driver if debugfs failed to initialize,
 	 * so we are not forwarding error to the user.
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 59e26901d935..2ced365e8247 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4254,6 +4254,8 @@ static void mlx5_ib_unbind_slave_port(struct mlx5_ib_dev *ibdev,
 	int err;
 	int i;
 
+	mlx5_ib_cleanup_cong_debugfs(ibdev, port_num);
+
 	spin_lock(&port->mp.mpi_lock);
 	if (!mpi->ibdev) {
 		spin_unlock(&port->mp.mpi_lock);
@@ -4331,6 +4333,10 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev,
 		goto unbind;
 	}
 
+	err = mlx5_ib_init_cong_debugfs(ibdev, port_num);
+	if (err)
+		goto unbind;
+
 	return true;
 
 unbind:
@@ -4748,12 +4754,14 @@ static void mlx5_ib_stage_counters_cleanup(struct mlx5_ib_dev *dev)
 
 static int mlx5_ib_stage_cong_debugfs_init(struct mlx5_ib_dev *dev)
 {
-	return mlx5_ib_init_cong_debugfs(dev);
+	return mlx5_ib_init_cong_debugfs(dev,
+					 mlx5_core_native_port_num(dev->mdev) - 1);
 }
 
 static void mlx5_ib_stage_cong_debugfs_cleanup(struct mlx5_ib_dev *dev)
 {
-	mlx5_ib_cleanup_cong_debugfs(dev);
+	mlx5_ib_cleanup_cong_debugfs(dev,
+				     mlx5_core_native_port_num(dev->mdev) - 1);
 }
 
 static int mlx5_ib_stage_uar_init(struct mlx5_ib_dev *dev)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index e1667e72ad49..637fdc95f1ee 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -665,6 +665,7 @@ struct mlx5_ib_multiport {
 struct mlx5_ib_port {
 	struct mlx5_ib_counters cnts;
 	struct mlx5_ib_multiport mp;
+	struct mlx5_ib_dbg_cc_params	*dbg_cc_params;
 };
 
 struct mlx5_roce {
@@ -684,6 +685,7 @@ struct mlx5_ib_dbg_param {
 	int			offset;
 	struct mlx5_ib_dev	*dev;
 	struct dentry		*dentry;
+	u8			port_num;
 };
 
 enum mlx5_ib_dbg_cc_types {
@@ -825,7 +827,6 @@ struct mlx5_ib_dev {
 	struct mlx5_sq_bfreg	bfreg;
 	struct mlx5_sq_bfreg	fp_bfreg;
 	struct mlx5_ib_delay_drop	delay_drop;
-	struct mlx5_ib_dbg_cc_params	*dbg_cc_params;
 	const struct mlx5_ib_profile	*profile;
 
 	/* protect the user_td */
@@ -1083,8 +1084,8 @@ __be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev, u8 port_num,
 int mlx5_get_roce_gid_type(struct mlx5_ib_dev *dev, u8 port_num,
 			   int index, enum ib_gid_type *gid_type);
 
-void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev);
-int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev);
+void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num);
+int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num);
 
 /* GSI QP helper functions */
 struct ib_qp *mlx5_ib_gsi_create_qp(struct ib_pd *pd,
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 10/14] IB/mlx5: Update counter implementation for dual port RoCE
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (8 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 09/14] IB/mlx5: Change debugfs to have per port contents Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 11/14] {net,IB}/mlx5: Change set_roce_gid to take a port number Leon Romanovsky
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Update the counter interface for multiple ports. Some counter sets
always comes from the primary device.

Port specific counters should be accessed per mlx5_core_dev not always
through the IB master mdev.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 69 +++++++++++++++++++++---------------
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 2 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 2ced365e8247..4791d747cc57 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3885,11 +3885,12 @@ static const struct mlx5_ib_counter extended_err_cnts[] = {
 
 static void mlx5_ib_dealloc_counters(struct mlx5_ib_dev *dev)
 {
-	unsigned int i;
+	int i;
 
 	for (i = 0; i < dev->num_ports; i++) {
-		mlx5_core_dealloc_q_counter(dev->mdev,
-					    dev->port[i].cnts.set_id);
+		if (dev->port[i].cnts.set_id)
+			mlx5_core_dealloc_q_counter(dev->mdev,
+						    dev->port[i].cnts.set_id);
 		kfree(dev->port[i].cnts.names);
 		kfree(dev->port[i].cnts.offsets);
 	}
@@ -3931,6 +3932,7 @@ static int __mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev,
 
 err_names:
 	kfree(cnts->names);
+	cnts->names = NULL;
 	return -ENOMEM;
 }
 
@@ -3977,37 +3979,33 @@ static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev,
 
 static int mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev)
 {
+	int err = 0;
 	int i;
-	int ret;
 
 	for (i = 0; i < dev->num_ports; i++) {
-		struct mlx5_ib_port *port = &dev->port[i];
+		err = __mlx5_ib_alloc_counters(dev, &dev->port[i].cnts);
+		if (err)
+			goto err_alloc;
+
+		mlx5_ib_fill_counters(dev, dev->port[i].cnts.names,
+				      dev->port[i].cnts.offsets);
 
-		ret = mlx5_core_alloc_q_counter(dev->mdev,
-						&port->cnts.set_id);
-		if (ret) {
+		err = mlx5_core_alloc_q_counter(dev->mdev,
+						&dev->port[i].cnts.set_id);
+		if (err) {
 			mlx5_ib_warn(dev,
 				     "couldn't allocate queue counter for port %d, err %d\n",
-				     i + 1, ret);
-			goto dealloc_counters;
+				     i + 1, err);
+			goto err_alloc;
 		}
-
-		ret = __mlx5_ib_alloc_counters(dev, &port->cnts);
-		if (ret)
-			goto dealloc_counters;
-
-		mlx5_ib_fill_counters(dev, port->cnts.names,
-				      port->cnts.offsets);
+		dev->port[i].cnts.set_id_valid = true;
 	}
 
 	return 0;
 
-dealloc_counters:
-	while (--i >= 0)
-		mlx5_core_dealloc_q_counter(dev->mdev,
-					    dev->port[i].cnts.set_id);
-
-	return ret;
+err_alloc:
+	mlx5_ib_dealloc_counters(dev);
+	return err;
 }
 
 static struct rdma_hw_stats *mlx5_ib_alloc_hw_stats(struct ib_device *ibdev,
@@ -4026,7 +4024,7 @@ static struct rdma_hw_stats *mlx5_ib_alloc_hw_stats(struct ib_device *ibdev,
 					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
 }
 
-static int mlx5_ib_query_q_counters(struct mlx5_ib_dev *dev,
+static int mlx5_ib_query_q_counters(struct mlx5_core_dev *mdev,
 				    struct mlx5_ib_port *port,
 				    struct rdma_hw_stats *stats)
 {
@@ -4039,7 +4037,7 @@ static int mlx5_ib_query_q_counters(struct mlx5_ib_dev *dev,
 	if (!out)
 		return -ENOMEM;
 
-	ret = mlx5_core_query_q_counter(dev->mdev,
+	ret = mlx5_core_query_q_counter(mdev,
 					port->cnts.set_id, 0,
 					out, outlen);
 	if (ret)
@@ -4061,28 +4059,43 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct mlx5_ib_port *port = &dev->port[port_num - 1];
+	struct mlx5_core_dev *mdev;
 	int ret, num_counters;
+	u8 mdev_port_num;
 
 	if (!stats)
 		return -EINVAL;
 
-	ret = mlx5_ib_query_q_counters(dev, port, stats);
+	num_counters = port->cnts.num_q_counters + port->cnts.num_cong_counters;
+
+	/* q_counters are per IB device, query the master mdev */
+	ret = mlx5_ib_query_q_counters(dev->mdev, port, stats);
 	if (ret)
 		return ret;
-	num_counters = port->cnts.num_q_counters;
 
 	if (MLX5_CAP_GEN(dev->mdev, cc_query_allowed)) {
+		mdev = mlx5_ib_get_native_port_mdev(dev, port_num,
+						    &mdev_port_num);
+		if (!mdev) {
+			/* If port is not affiliated yet, its in down state
+			 * which doesn't have any counters yet, so it would be
+			 * zero. So no need to read from the HCA.
+			 */
+			goto done;
+		}
 		ret = mlx5_lag_query_cong_counters(dev->mdev,
 						   stats->value +
 						   port->cnts.num_q_counters,
 						   port->cnts.num_cong_counters,
 						   port->cnts.offsets +
 						   port->cnts.num_q_counters);
+
+		mlx5_ib_put_native_port_mdev(dev, port_num);
 		if (ret)
 			return ret;
-		num_counters += port->cnts.num_cong_counters;
 	}
 
+done:
 	return num_counters;
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 637fdc95f1ee..283b5f94898e 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -652,6 +652,7 @@ struct mlx5_ib_counters {
 	u32 num_q_counters;
 	u32 num_cong_counters;
 	u16 set_id;
+	bool set_id_valid;
 };
 
 struct mlx5_ib_multiport_info;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 11/14] {net,IB}/mlx5: Change set_roce_gid to take a port number
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (9 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 10/14] IB/mlx5: Update counter implementation for dual port RoCE Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 12/14] IB/mlx5: Route MADs for dual port RoCE Leon Romanovsky
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c                   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 7 ++++---
 drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c   | 5 ++++-
 include/linux/mlx5/driver.h                         | 2 +-
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4791d747cc57..653b56377e69 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -475,7 +475,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
 
 	return mlx5_core_roce_gid_set(dev->mdev, index, roce_version,
 				      roce_l3_type, gid->raw, mac, vlan,
-				      vlan_id);
+				      vlan_id, port_num);
 }
 
 static int mlx5_ib_add_gid(struct ib_device *device, u8 port_num,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index c841b03c3e48..e6175f8ac0e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -888,7 +888,8 @@ struct mlx5_fpga_conn *mlx5_fpga_conn_create(struct mlx5_fpga_device *fdev,
 	err = mlx5_core_roce_gid_set(fdev->mdev, conn->qp.sgid_index,
 				     MLX5_ROCE_VERSION_2,
 				     MLX5_ROCE_L3_TYPE_IPV6,
-				     remote_ip, remote_mac, true, 0);
+				     remote_ip, remote_mac, true, 0,
+				     MLX5_FPGA_PORT_NUM);
 	if (err) {
 		mlx5_fpga_err(fdev, "Failed to set SGID: %d\n", err);
 		ret = ERR_PTR(err);
@@ -954,7 +955,7 @@ struct mlx5_fpga_conn *mlx5_fpga_conn_create(struct mlx5_fpga_device *fdev,
 	mlx5_fpga_conn_destroy_cq(conn);
 err_gid:
 	mlx5_core_roce_gid_set(fdev->mdev, conn->qp.sgid_index, 0, 0, NULL,
-			       NULL, false, 0);
+			       NULL, false, 0, MLX5_FPGA_PORT_NUM);
 err_rsvd_gid:
 	mlx5_core_reserved_gid_free(fdev->mdev, conn->qp.sgid_index);
 err:
@@ -982,7 +983,7 @@ void mlx5_fpga_conn_destroy(struct mlx5_fpga_conn *conn)
 	mlx5_fpga_conn_destroy_cq(conn);
 
 	mlx5_core_roce_gid_set(conn->fdev->mdev, conn->qp.sgid_index, 0, 0,
-			       NULL, NULL, false, 0);
+			       NULL, NULL, false, 0, MLX5_FPGA_PORT_NUM);
 	mlx5_core_reserved_gid_free(conn->fdev->mdev, conn->qp.sgid_index);
 	kfree(conn);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
index 573f59f46d41..7722a3f9bb68 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
@@ -121,7 +121,7 @@ EXPORT_SYMBOL_GPL(mlx5_core_reserved_gids_count);
 
 int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index,
 			   u8 roce_version, u8 roce_l3_type, const u8 *gid,
-			   const u8 *mac, bool vlan, u16 vlan_id)
+			   const u8 *mac, bool vlan, u16 vlan_id, u8 port_num)
 {
 #define MLX5_SET_RA(p, f, v) MLX5_SET(roce_addr_layout, p, f, v)
 	u32  in[MLX5_ST_SZ_DW(set_roce_address_in)] = {0};
@@ -148,6 +148,9 @@ int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index,
 		memcpy(addr_l3_addr, gid, gidsz);
 	}
 
+	if (MLX5_CAP_GEN(dev, num_vhca_ports) > 0)
+		MLX5_SET(set_roce_address_in, in, vhca_port_num, port_num);
+
 	MLX5_SET(set_roce_address_in, in, roce_address_index, index);
 	MLX5_SET(set_roce_address_in, in, opcode, MLX5_CMD_OP_SET_ROCE_ADDRESS);
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d5c787519e06..9136e35f2f7e 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1112,7 +1112,7 @@ void mlx5_free_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg);
 unsigned int mlx5_core_reserved_gids_count(struct mlx5_core_dev *dev);
 int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index,
 			   u8 roce_version, u8 roce_l3_type, const u8 *gid,
-			   const u8 *mac, bool vlan, u16 vlan_id);
+			   const u8 *mac, bool vlan, u16 vlan_id, u8 port_num);
 
 static inline int fw_initializing(struct mlx5_core_dev *dev)
 {
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 12/14] IB/mlx5: Route MADs for dual port RoCE
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (10 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 11/14] {net,IB}/mlx5: Change set_roce_gid to take a port number Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 13/14] IB/mlx5: Don't advertise RAW QP support in dual port mode Leon Romanovsky
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Route performance query MADs to the correct mlx5_core_dev when using
dual port RoCE mode.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mad.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 0559e0a9e398..32a9e9228b13 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -197,10 +197,9 @@ static void pma_cnt_assign(struct ib_pma_portcounters *pma_cnt,
 			     vl_15_dropped);
 }
 
-static int process_pma_cmd(struct ib_device *ibdev, u8 port_num,
+static int process_pma_cmd(struct mlx5_core_dev *mdev, u8 port_num,
 			   const struct ib_mad *in_mad, struct ib_mad *out_mad)
 {
-	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	int err;
 	void *out_cnt;
 
@@ -222,7 +221,7 @@ static int process_pma_cmd(struct ib_device *ibdev, u8 port_num,
 		if (!out_cnt)
 			return IB_MAD_RESULT_FAILURE;
 
-		err = mlx5_core_query_vport_counter(dev->mdev, 0, 0,
+		err = mlx5_core_query_vport_counter(mdev, 0, 0,
 						    port_num, out_cnt, sz);
 		if (!err)
 			pma_cnt_ext_assign(pma_cnt_ext, out_cnt);
@@ -235,7 +234,7 @@ static int process_pma_cmd(struct ib_device *ibdev, u8 port_num,
 		if (!out_cnt)
 			return IB_MAD_RESULT_FAILURE;
 
-		err = mlx5_core_query_ib_ppcnt(dev->mdev, port_num,
+		err = mlx5_core_query_ib_ppcnt(mdev, port_num,
 					       out_cnt, sz);
 		if (!err)
 			pma_cnt_assign(pma_cnt, out_cnt);
@@ -255,9 +254,11 @@ int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			u16 *out_mad_pkey_index)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
-	struct mlx5_core_dev *mdev = dev->mdev;
 	const struct ib_mad *in_mad = (const struct ib_mad *)in;
 	struct ib_mad *out_mad = (struct ib_mad *)out;
+	struct mlx5_core_dev *mdev;
+	u8 mdev_port_num;
+	int ret;
 
 	if (WARN_ON_ONCE(in_mad_size != sizeof(*in_mad) ||
 			 *out_mad_size != sizeof(*out_mad)))
@@ -265,14 +266,20 @@ int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 
 	memset(out_mad->data, 0, sizeof(out_mad->data));
 
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num, &mdev_port_num);
+	if (!mdev)
+		return IB_MAD_RESULT_FAILURE;
+
 	if (MLX5_CAP_GEN(mdev, vport_counters) &&
 	    in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT &&
 	    in_mad->mad_hdr.method == IB_MGMT_METHOD_GET) {
-		return process_pma_cmd(ibdev, port_num, in_mad, out_mad);
+		ret = process_pma_cmd(mdev, mdev_port_num, in_mad, out_mad);
 	} else {
-		return process_mad(ibdev, mad_flags, port_num, in_wc, in_grh,
+		ret =  process_mad(ibdev, mad_flags, port_num, in_wc, in_grh,
 				   in_mad, out_mad);
 	}
+	mlx5_ib_put_native_port_mdev(dev, port_num);
+	return ret;
 }
 
 int mlx5_query_ext_port_caps(struct mlx5_ib_dev *dev, u8 port)
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 13/14] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (11 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 12/14] IB/mlx5: Route MADs for dual port RoCE Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 15:25   ` [PATCH rdma-next v2 14/14] net/mlx5: Set num_vhca_ports capability Leon Romanovsky
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When operating in dual port RoCE mode FW doesn't support steering for
raw QPs on the slave port. They still work on the master port, but
the user has no way of knowing which port is the master. The
capability is reported per device, not per port.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 653b56377e69..5d6fba986fa5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -731,6 +731,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	int max_rq_sg;
 	int max_sq_sg;
 	u64 min_page_size = 1ull << MLX5_CAP_GEN(mdev, log_pg_sz);
+	bool raw_support = !mlx5_core_mp_enabled(mdev);
 	struct mlx5_ib_query_device_resp resp = {};
 	size_t resp_len;
 	u64 max_tso;
@@ -794,7 +795,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	if (MLX5_CAP_GEN(mdev, block_lb_mc))
 		props->device_cap_flags |= IB_DEVICE_BLOCK_MULTICAST_LOOPBACK;
 
-	if (MLX5_CAP_GEN(dev->mdev, eth_net_offloads)) {
+	if (MLX5_CAP_GEN(dev->mdev, eth_net_offloads) && raw_support) {
 		if (MLX5_CAP_ETH(mdev, csum_cap)) {
 			/* Legacy bit to support old userspace libraries */
 			props->device_cap_flags |= IB_DEVICE_RAW_IP_CSUM;
@@ -843,7 +844,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	}
 
 	if (MLX5_CAP_GEN(dev->mdev, rq_delay_drop) &&
-	    MLX5_CAP_GEN(dev->mdev, general_notification_event))
+	    MLX5_CAP_GEN(dev->mdev, general_notification_event) &&
+	    raw_support)
 		props->raw_packet_caps |= IB_RAW_PACKET_CAP_DELAY_DROP;
 
 	if (MLX5_CAP_GEN(mdev, ipoib_enhanced_offloads) &&
@@ -851,7 +853,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM;
 
 	if (MLX5_CAP_GEN(dev->mdev, eth_net_offloads) &&
-	    MLX5_CAP_ETH(dev->mdev, scatter_fcs)) {
+	    MLX5_CAP_ETH(dev->mdev, scatter_fcs) &&
+	    raw_support) {
 		/* Legacy bit to support old userspace libraries */
 		props->device_cap_flags |= IB_DEVICE_RAW_SCATTER_FCS;
 		props->raw_packet_caps |= IB_RAW_PACKET_CAP_SCATTER_FCS;
@@ -915,7 +918,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		props->device_cap_flags |= IB_DEVICE_VIRTUAL_FUNCTION;
 
 	if (mlx5_ib_port_link_layer(ibdev, 1) ==
-	    IB_LINK_LAYER_ETHERNET) {
+	    IB_LINK_LAYER_ETHERNET && raw_support) {
 		props->rss_caps.max_rwq_indirection_tables =
 			1 << MLX5_CAP_GEN(dev->mdev, log_max_rqt);
 		props->rss_caps.max_rwq_indirection_table_size =
@@ -952,7 +955,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		resp.response_length += sizeof(resp.cqe_comp_caps);
 	}
 
-	if (field_avail(typeof(resp), packet_pacing_caps, uhw->outlen)) {
+	if (field_avail(typeof(resp), packet_pacing_caps, uhw->outlen) &&
+	    raw_support) {
 		if (MLX5_CAP_QOS(mdev, packet_pacing) &&
 		    MLX5_CAP_GEN(mdev, qos)) {
 			resp.packet_pacing_caps.qp_rate_limit_max =
@@ -1011,7 +1015,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		}
 	}
 
-	if (field_avail(typeof(resp), striding_rq_caps, uhw->outlen)) {
+	if (field_avail(typeof(resp), striding_rq_caps, uhw->outlen) &&
+	    raw_support) {
 		resp.response_length += sizeof(resp.striding_rq_caps);
 		if (MLX5_CAP_GEN(mdev, striding_rq)) {
 			resp.striding_rq_caps.min_single_stride_log_num_of_bytes =
@@ -3681,12 +3686,14 @@ static u32 get_core_cap_flags(struct ib_device *ibdev)
 	enum rdma_link_layer ll = mlx5_ib_port_link_layer(ibdev, 1);
 	u8 l3_type_cap = MLX5_CAP_ROCE(dev->mdev, l3_type);
 	u8 roce_version_cap = MLX5_CAP_ROCE(dev->mdev, roce_version);
+	bool raw_support = !mlx5_core_mp_enabled(dev->mdev);
 	u32 ret = 0;
 
 	if (ll == IB_LINK_LAYER_INFINIBAND)
 		return RDMA_CORE_PORT_IBA_IB;
 
-	ret = RDMA_CORE_PORT_RAW_PACKET;
+	if (raw_support)
+		ret = RDMA_CORE_PORT_RAW_PACKET;
 
 	if (!(l3_type_cap & MLX5_ROCE_L3_TYPE_IPV4_CAP))
 		return ret;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH rdma-next v2 14/14] net/mlx5: Set num_vhca_ports capability
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (12 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 13/14] IB/mlx5: Don't advertise RAW QP support in dual port mode Leon Romanovsky
@ 2018-01-04 15:25   ` Leon Romanovsky
  2018-01-04 19:53   ` [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644 Or Gerlitz
  2018-01-09  2:51   ` Jason Gunthorpe
  15 siblings, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-04 15:25 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Set the current capability to the max capability. Doing so enables dual
port RoCE functionality if supported by the firmware.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index e382a3ca759e..d4a471a76d82 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -557,6 +557,12 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
 	if (MLX5_CAP_GEN_MAX(dev, dct))
 		MLX5_SET(cmd_hca_cap, set_hca_cap, dct, 1);
 
+	if (MLX5_CAP_GEN_MAX(dev, num_vhca_ports))
+		MLX5_SET(cmd_hca_cap,
+			 set_hca_cap,
+			 num_vhca_ports,
+			 MLX5_CAP_GEN_MAX(dev, num_vhca_ports));
+
 	err = set_caps(dev, set_ctx, set_sz,
 		       MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
 
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (13 preceding siblings ...)
  2018-01-04 15:25   ` [PATCH rdma-next v2 14/14] net/mlx5: Set num_vhca_ports capability Leon Romanovsky
@ 2018-01-04 19:53   ` Or Gerlitz
       [not found]     ` <CAJ3xEMjptQf99ZdGxkB6kTh9LkQ7-pjiObMh+knDWXFStPOg7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2018-01-09  2:51   ` Jason Gunthorpe
  15 siblings, 1 reply; 26+ messages in thread
From: Or Gerlitz @ 2018-01-04 19:53 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky

On Thu, Jan 4, 2018 at 5:25 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>  v1 -> v2:
>   * Dropped "IB/mlx5: Use correct mdev for vport queries in ib_virt"
>  v0 -> v1:
>   * Rebased to latest rdma/for-next
>   * Enriched commit messages.

The V2 post LGTM re the points I was commenting on: the IB virt patch
removed and
the  cover letter + change-log properly elaborate on how the feature
is configured and
what would be the resulted IB devices under the different variations,
see one followup below:

[..]

> SR-IOV devices follow the same pattern as the physical ones. VFs of a
> master port can bind VFs of slave ports, if available, and operate as
> dual port devices.
>
> Examples of devices passed to a VM:
> (master)         - One net device, one IB device that has two ports. The slave
>                    port will always be down.
> (slave)          - One net device, no IB devices.
> (slave, slave)   - Two net devices and no IB devices.
> (master, master) - Two net devices, two IB devices, each with two ports.
>                    The slave port of each device will always be down.
> (master, slave)  - Two net devices, one IB device, with two ports. Both
>                     ports can be used.
>
> There are no changes to the existing design for net devices.
>
> The feature is disabled by default and it is enabled in firmware with mlxconfig.

Dan, so what happens w.r.t to the different combinations specified above, if on
SRIOV setup where the admin enabled the feature, VFs land in VMs whose
mlx5 driver doesn't have these changes?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
       [not found]     ` <CAJ3xEMjptQf99ZdGxkB6kTh9LkQ7-pjiObMh+knDWXFStPOg7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-04 21:50       ` Daniel Jurgens
       [not found]         ` <2618f285-6fa4-06ac-4b4a-51d0b2660b61-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Jurgens @ 2018-01-04 21:50 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky

On 1/4/2018 1:53 PM, Or Gerlitz wrote:
> On Thu, Jan 4, 2018 at 5:25 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>  v1 -> v2:
>>   * Dropped "IB/mlx5: Use correct mdev for vport queries in ib_virt"
>>  v0 -> v1:
>>   * Rebased to latest rdma/for-next
>>   * Enriched commit messages.
> The V2 post LGTM re the points I was commenting on: the IB virt patch
> removed and
> the  cover letter + change-log properly elaborate on how the feature
> is configured and
> what would be the resulted IB devices under the different variations,
> see one followup below:
>
> [..]
>
>> SR-IOV devices follow the same pattern as the physical ones. VFs of a
>> master port can bind VFs of slave ports, if available, and operate as
>> dual port devices.
>>
>> Examples of devices passed to a VM:
>> (master)         - One net device, one IB device that has two ports. The slave
>>                    port will always be down.
>> (slave)          - One net device, no IB devices.
>> (slave, slave)   - Two net devices and no IB devices.
>> (master, master) - Two net devices, two IB devices, each with two ports.
>>                    The slave port of each device will always be down.
>> (master, slave)  - Two net devices, one IB device, with two ports. Both
>>                     ports can be used.
>>
>> There are no changes to the existing design for net devices.
>>
>> The feature is disabled by default and it is enabled in firmware with mlxconfig.
> Dan, so what happens w.r.t to the different combinations specified above, if on
> SRIOV setup where the admin enabled the feature, VFs land in VMs whose
> mlx5 driver doesn't have these changes?

In this case it would looks the same as it does today. One IB device per PCI device with one port each. The old driver doesn't know about the new capabilities, so it wouldn't bind the ports. Operating this way isn't ideal though.  In DPR mode the FW biases the schedule queue allocation to the master port. Performance could be worse on the VFs that could be slave ports if they are used as normal IB devices.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 06/14] {net,IB}/mlx5: Manage port association for multiport RoCE
       [not found]     ` <20180104152544.28919-7-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2018-01-04 23:33       ` Jason Gunthorpe
       [not found]         ` <20180104233334.GA18993-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Gunthorpe @ 2018-01-04 23:33 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Daniel Jurgens, Parav Pandit

> +struct mlx5_ib_odp {
> +	struct ib_odp_caps	caps;
> +	u64			max_size;
> +	/*
> +	 * Sleepable RCU that prevents destruction of MRs while they are still
> +	 * being used by a page fault handler.
> +	 */
> +	struct srcu_struct      mr_srcu;
> +	u32			null_mkey;
> +	void			(*sync)(struct mlx5_ib_dev *dev);
> +};

Well this doesn't belong in here. rebase error or something?

Someone needs to go over these patches more carefully..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 06/14] {net,IB}/mlx5: Manage port association for multiport RoCE
       [not found]         ` <20180104233334.GA18993-uk2M96/98Pc@public.gmane.org>
@ 2018-01-05  6:36           ` Leon Romanovsky
  2018-01-05 13:47           ` Daniel Jurgens
  1 sibling, 0 replies; 26+ messages in thread
From: Leon Romanovsky @ 2018-01-05  6:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Daniel Jurgens, Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 823 bytes --]

On Thu, Jan 04, 2018 at 04:33:34PM -0700, Jason Gunthorpe wrote:
> > +struct mlx5_ib_odp {
> > +	struct ib_odp_caps	caps;
> > +	u64			max_size;
> > +	/*
> > +	 * Sleepable RCU that prevents destruction of MRs while they are still
> > +	 * being used by a page fault handler.
> > +	 */
> > +	struct srcu_struct      mr_srcu;
> > +	u32			null_mkey;
> > +	void			(*sync)(struct mlx5_ib_dev *dev);
> > +};
>
> Well this doesn't belong in here. rebase error or something?

Very strange, this chunk indeed doesn't belong to this series and Daniel
didn't add it too, but I see this mlx5_ib_odp struct above of newly added
mlx5_ib_multiport_info struct in our review system.

In this series, it comes from my rr-cache, but I don't know why it is in
the review system.

We can safely remove it. Do you want me to respin it?

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 06/14] {net,IB}/mlx5: Manage port association for multiport RoCE
       [not found]         ` <20180104233334.GA18993-uk2M96/98Pc@public.gmane.org>
  2018-01-05  6:36           ` Leon Romanovsky
@ 2018-01-05 13:47           ` Daniel Jurgens
  1 sibling, 0 replies; 26+ messages in thread
From: Daniel Jurgens @ 2018-01-05 13:47 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Parav Pandit

On 1/4/2018 5:33 PM, Jason Gunthorpe wrote:
>> +struct mlx5_ib_odp {
>> +	struct ib_odp_caps	caps;
>> +	u64			max_size;
>> +	/*
>> +	 * Sleepable RCU that prevents destruction of MRs while they are still
>> +	 * being used by a page fault handler.
>> +	 */
>> +	struct srcu_struct      mr_srcu;
>> +	u32			null_mkey;
>> +	void			(*sync)(struct mlx5_ib_dev *dev);
>> +};
> Well this doesn't belong in here. rebase error or something?
>
> Someone needs to go over these patches more carefully..
>
> Jason

Looks like a rebase error, it's not there in V0.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
       [not found]         ` <2618f285-6fa4-06ac-4b4a-51d0b2660b61-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2018-01-05 14:59           ` Or Gerlitz
       [not found]             ` <CAJ3xEMhVxBqJUwdRn3BXJPnf9J-_gXtrWrQgun3bCEW7W7Oq1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Or Gerlitz @ 2018-01-05 14:59 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky

On Thu, Jan 4, 2018 at 11:50 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 1/4/2018 1:53 PM, Or Gerlitz wrote:
>> On Thu, Jan 4, 2018 at 5:25 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>  v1 -> v2:
>>>   * Dropped "IB/mlx5: Use correct mdev for vport queries in ib_virt"
>>>  v0 -> v1:
>>>   * Rebased to latest rdma/for-next
>>>   * Enriched commit messages.
>> The V2 post LGTM re the points I was commenting on: the IB virt patch
>> removed and
>> the  cover letter + change-log properly elaborate on how the feature
>> is configured and
>> what would be the resulted IB devices under the different variations,
>> see one followup below:
>>
>> [..]
>>
>>> SR-IOV devices follow the same pattern as the physical ones. VFs of a
>>> master port can bind VFs of slave ports, if available, and operate as
>>> dual port devices.
>>>
>>> Examples of devices passed to a VM:
>>> (master)         - One net device, one IB device that has two ports. The slave
>>>                    port will always be down.
>>> (slave)          - One net device, no IB devices.
>>> (slave, slave)   - Two net devices and no IB devices.
>>> (master, master) - Two net devices, two IB devices, each with two ports.
>>>                    The slave port of each device will always be down.
>>> (master, slave)  - Two net devices, one IB device, with two ports. Both
>>>                     ports can be used.
>>>
>>> There are no changes to the existing design for net devices.
>>>
>>> The feature is disabled by default and it is enabled in firmware with mlxconfig.
>> Dan, so what happens w.r.t to the different combinations specified above, if on
>> SRIOV setup where the admin enabled the feature, VFs land in VMs whose
>> mlx5 driver doesn't have these changes?
>
> In this case it would looks the same as it does today. One IB device per PCI device
> with one port each. The old driver doesn't know about the new capabilities, so it wouldn't bind the ports.

sounds reasonable

> Operating this way isn't ideal though.  In DPR mode the FW biases the schedule queue
> allocation to the master port. Performance could be worse on the VFs that could be slave
> ports if they are used as normal IB devices.

So the FW doesn't have any indication that these driver instances are
unware to what's going
on? doesn't sounds ideal as you said and as life are. It's uneasy to
come up with designs
that would let the different Unmodified/Modified PF/VF combinations
(U/M M/U) work @ their
best, but we should aim there, this time it sounds that didn't work
and we will have to live with
that unless you can think on a way to get the FW signal that to
themselves and avoid the bias.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
       [not found]             ` <CAJ3xEMhVxBqJUwdRn3BXJPnf9J-_gXtrWrQgun3bCEW7W7Oq1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-05 15:03               ` Daniel Jurgens
       [not found]                 ` <792bac90-b1eb-110f-399c-08d5309d92dd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Jurgens @ 2018-01-05 15:03 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky

On 1/5/2018 8:59 AM, Or Gerlitz wrote:
> On Thu, Jan 4, 2018 at 11:50 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> On 1/4/2018 1:53 PM, Or Gerlitz wrote:
>>> On Thu, Jan 4, 2018 at 5:25 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>>  v1 -> v2:
>>>>   * Dropped "IB/mlx5: Use correct mdev for vport queries in ib_virt"
>>>>  v0 -> v1:
>>>>   * Rebased to latest rdma/for-next
>>>>   * Enriched commit messages.
>>> The V2 post LGTM re the points I was commenting on: the IB virt patch
>>> removed and
>>> the  cover letter + change-log properly elaborate on how the feature
>>> is configured and
>>> what would be the resulted IB devices under the different variations,
>>> see one followup below:
>>>
>>> [..]
>>>
>>>> SR-IOV devices follow the same pattern as the physical ones. VFs of a
>>>> master port can bind VFs of slave ports, if available, and operate as
>>>> dual port devices.
>>>>
>>>> Examples of devices passed to a VM:
>>>> (master)         - One net device, one IB device that has two ports. The slave
>>>>                    port will always be down.
>>>> (slave)          - One net device, no IB devices.
>>>> (slave, slave)   - Two net devices and no IB devices.
>>>> (master, master) - Two net devices, two IB devices, each with two ports.
>>>>                    The slave port of each device will always be down.
>>>> (master, slave)  - Two net devices, one IB device, with two ports. Both
>>>>                     ports can be used.
>>>>
>>>> There are no changes to the existing design for net devices.
>>>>
>>>> The feature is disabled by default and it is enabled in firmware with mlxconfig.
>>> Dan, so what happens w.r.t to the different combinations specified above, if on
>>> SRIOV setup where the admin enabled the feature, VFs land in VMs whose
>>> mlx5 driver doesn't have these changes?
>> In this case it would looks the same as it does today. One IB device per PCI device
>> with one port each. The old driver doesn't know about the new capabilities, so it wouldn't bind the ports.
> sounds reasonable
>
>> Operating this way isn't ideal though.  In DPR mode the FW biases the schedule queue
>> allocation to the master port. Performance could be worse on the VFs that could be slave
>> ports if they are used as normal IB devices.
> So the FW doesn't have any indication that these driver instances are
> unware to what's going
> on? doesn't sounds ideal as you said and as life are. It's uneasy to
> come up with designs
> that would let the different Unmodified/Modified PF/VF combinations
> (U/M M/U) work @ their
> best, but we should aim there, this time it sounds that didn't work
> and we will have to live with
> that unless you can think on a way to get the FW signal that to
> themselves and avoid the bias.

The schedule queue allocation happens during FW boot, even before the PF drivers are loaded.  I don't think there's any way around it in this case. 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
       [not found]                 ` <792bac90-b1eb-110f-399c-08d5309d92dd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2018-01-05 15:22                   ` Or Gerlitz
       [not found]                     ` <CAJ3xEMjOGEQQf+y4wEEPiS_F9fhYmDwBmd=ucUDQX_9416gzKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Or Gerlitz @ 2018-01-05 15:22 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky

On Fri, Jan 5, 2018 at 5:03 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 1/5/2018 8:59 AM, Or Gerlitz wrote:

>>> Operating this way isn't ideal though.  In DPR mode the FW biases the schedule queue
>>> allocation to the master port. Performance could be worse on the VFs that could be slave
>>> ports if they are used as normal IB devices.

>> So the FW doesn't have any indication that these driver instances are
>> unware to what's going
>> on? doesn't sounds ideal as you said and as life are. It's uneasy to
>> come up with designs
>> that would let the different Unmodified/Modified PF/VF combinations
>> (U/M M/U) work @ their
>> best, but we should aim there, this time it sounds that didn't work
>> and we will have to live with
>> that unless you can think on a way to get the FW signal that to
>> themselves and avoid the bias.

> The schedule queue allocation happens during FW boot, even before the PF drivers are loaded.
> I don't think there's any way around it in this case.

You mentioned PF drivers.. in multi-host config we can run into the
same U/M M/U problem.
some hosts/PF instances run the modified code and while other
hosts/PFs unmodified code,
or in this case the U PFs will not enable the feature? and if this is
the case, the FW does know
how to handle U PFs and their VFs?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
       [not found]                     ` <CAJ3xEMjOGEQQf+y4wEEPiS_F9fhYmDwBmd=ucUDQX_9416gzKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-05 15:57                       ` Daniel Jurgens
       [not found]                         ` <d15d980b-1d67-277e-60f8-0b7e92035a06-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Jurgens @ 2018-01-05 15:57 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky

On 1/5/2018 9:22 AM, Or Gerlitz wrote:
> On Fri, Jan 5, 2018 at 5:03 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> On 1/5/2018 8:59 AM, Or Gerlitz wrote:
>>>> Operating this way isn't ideal though.  In DPR mode the FW biases the schedule queue
>>>> allocation to the master port. Performance could be worse on the VFs that could be slave
>>>> ports if they are used as normal IB devices.
>>> So the FW doesn't have any indication that these driver instances are
>>> unware to what's going
>>> on? doesn't sounds ideal as you said and as life are. It's uneasy to
>>> come up with designs
>>> that would let the different Unmodified/Modified PF/VF combinations
>>> (U/M M/U) work @ their
>>> best, but we should aim there, this time it sounds that didn't work
>>> and we will have to live with
>>> that unless you can think on a way to get the FW signal that to
>>> themselves and avoid the bias.
>> The schedule queue allocation happens during FW boot, even before the PF drivers are loaded.
>> I don't think there's any way around it in this case.
> You mentioned PF drivers.. in multi-host config we can run into the
> same U/M M/U problem.
> some hosts/PF instances run the modified code and while other
> hosts/PFs unmodified code,
> or in this case the U PFs will not enable the feature? and if this is
> the case, the FW does know
> how to handle U PFs and their VFs?

Unmodified driver on one side fo the multihost or even a non-multihost configuration with unmodified drivers with a card with this feature enabled would be the same as the VM scenario.  Single one port IB device per function, with the unbalanced allocation of schedule queues in FW.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
       [not found]                         ` <d15d980b-1d67-277e-60f8-0b7e92035a06-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2018-01-05 20:44                           ` Or Gerlitz
  0 siblings, 0 replies; 26+ messages in thread
From: Or Gerlitz @ 2018-01-05 20:44 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky

On Fri, Jan 5, 2018 at 5:57 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> Unmodified driver on one side fo the multihost or even a non-multihost configuration with unmodified drivers with a card with this feature enabled would be the same as the VM scenario.  Single one port IB device per function, with the unbalanced allocation of schedule queues in FW.

got it. I guess if we want we can try and improve later things on the FW
side using the proposed driver/FW API, since the FW does know if a driver
instance is unaware (they use host id of 0s).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644
       [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (14 preceding siblings ...)
  2018-01-04 19:53   ` [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644 Or Gerlitz
@ 2018-01-09  2:51   ` Jason Gunthorpe
  15 siblings, 0 replies; 26+ messages in thread
From: Jason Gunthorpe @ 2018-01-09  2:51 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Daniel Jurgens, Parav Pandit

On Thu, Jan 04, 2018 at 05:25:30PM +0200, Leon Romanovsky wrote:
> Changelog:
>  v1 -> v2:
>   * Dropped "IB/mlx5: Use correct mdev for vport queries in ib_virt"
>  v0 -> v1:
>   * Rebased to latest rdma/for-next
>   * Enriched commit messages.
> 
> Thanks
> 
> >From Daniel:
> 
> This feature allows RDMA resources (pd, mr, cq, qp, etc) to be used with
> both physical ports of capable mlx5 devices. When enabled a single IB
> device with two ports will be registered instead of two single port
> devices.
> 
> There are still two PCI devices underlying the two port device, the
> capabilities indicate which device is the "master" device and which is
> the slave.
> 
> When the add callback function is called for a slave device a list of IB
> devices is searched for matching master device, indicated by the
> capabilities and the system_image_guid. If a match is found the slave is
> bound to the master device, otherwise it's placed on a list, in case
> it's master becomes available in the future. When a master device is added it
> searches the list of available slaves for a matching slave device. If a
> match is found it binds the slave as its 2nd port. If no match as found
> the device still appears as a dual port device, with the 2nd port down.
> RDMA resources can still created that use the yet unavailable 2nd port.
> 
> Commands related to IB resources are all routed through the master
> mlx5_core device. Port specific commands, like those for hardware
> counters are routed to their respective port mlx5_core device. Since
> devices can appear and disappear asynchronously a reference count on the
> underlying mlx5_core device is maintained. Getting and putting this
> reference is only necessary for commands destined to a specific port,
> the master core device can be used freely, as it will exist while the IB
> device exists.
> 
> SR-IOV devices follow the same pattern as the physical ones. VFs of a
> master port can bind VFs of slave ports, if available, and operate as
> dual port devices.
> 
> Examples of devices passed to a VM:
> (master)         - One net device, one IB device that has two ports. The slave
>                    port will always be down.
> (slave)          - One net device, no IB devices.
> (slave, slave)   - Two net devices and no IB devices.
> (master, master) - Two net devices, two IB devices, each with two ports.
>                    The slave port of each device will always be down.
> (master, slave)  - Two net devices, one IB device, with two ports. Both
> 		    ports can be used.
> 
> There are no changes to the existing design for net devices.
> 
> The feature is disabled by default and it is enabled in firmware
> with mlxconfig.

> 	Thanks
> 
> Daniel Jurgens (13):
>   net/mlx5: Fix race for multiple RoCE enable
>   net/mlx5: Set software owner ID during init HCA
>   IB/core: Change roce_rescan_device to return void
>   IB/mlx5: Reduce the use of num_port capability
>   IB/mlx5: Make netdev notifications multiport capable
>   {net,IB}/mlx5: Manage port association for multiport RoCE
>   IB/mlx5: Move IB event processing onto a workqueue
>   IB/mlx5: Implement dual port functionality in query routines
>   IB/mlx5: Update counter implementation for dual port RoCE
>   {net,IB}/mlx5: Change set_roce_gid to take a port number
>   IB/mlx5: Route MADs for dual port RoCE
>   IB/mlx5: Don't advertise RAW QP support in dual port mode
>   net/mlx5: Set num_vhca_ports capability
> 
> Parav Pandit (1):
>   IB/mlx5: Change debugfs to have per port contents

I made the revisions I noted in the the followup mails and applied
this to for-next.

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-01-09  2:51 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-04 15:25 [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644 Leon Romanovsky
     [not found] ` <20180104152544.28919-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-01-04 15:25   ` [PATCH rdma-next v2 01/14] net/mlx5: Fix race for multiple RoCE enable Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 02/14] net/mlx5: Set software owner ID during init HCA Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 03/14] IB/core: Change roce_rescan_device to return void Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 04/14] IB/mlx5: Reduce the use of num_port capability Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 05/14] IB/mlx5: Make netdev notifications multiport capable Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 06/14] {net,IB}/mlx5: Manage port association for multiport RoCE Leon Romanovsky
     [not found]     ` <20180104152544.28919-7-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-01-04 23:33       ` Jason Gunthorpe
     [not found]         ` <20180104233334.GA18993-uk2M96/98Pc@public.gmane.org>
2018-01-05  6:36           ` Leon Romanovsky
2018-01-05 13:47           ` Daniel Jurgens
2018-01-04 15:25   ` [PATCH rdma-next v2 07/14] IB/mlx5: Move IB event processing onto a workqueue Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 08/14] IB/mlx5: Implement dual port functionality in query routines Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 09/14] IB/mlx5: Change debugfs to have per port contents Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 10/14] IB/mlx5: Update counter implementation for dual port RoCE Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 11/14] {net,IB}/mlx5: Change set_roce_gid to take a port number Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 12/14] IB/mlx5: Route MADs for dual port RoCE Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 13/14] IB/mlx5: Don't advertise RAW QP support in dual port mode Leon Romanovsky
2018-01-04 15:25   ` [PATCH rdma-next v2 14/14] net/mlx5: Set num_vhca_ports capability Leon Romanovsky
2018-01-04 19:53   ` [PATCH rdma-next v2 00/14] index e382a3ca759e..d4a471a76d82 100644 Or Gerlitz
     [not found]     ` <CAJ3xEMjptQf99ZdGxkB6kTh9LkQ7-pjiObMh+knDWXFStPOg7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-04 21:50       ` Daniel Jurgens
     [not found]         ` <2618f285-6fa4-06ac-4b4a-51d0b2660b61-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-05 14:59           ` Or Gerlitz
     [not found]             ` <CAJ3xEMhVxBqJUwdRn3BXJPnf9J-_gXtrWrQgun3bCEW7W7Oq1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-05 15:03               ` Daniel Jurgens
     [not found]                 ` <792bac90-b1eb-110f-399c-08d5309d92dd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-05 15:22                   ` Or Gerlitz
     [not found]                     ` <CAJ3xEMjOGEQQf+y4wEEPiS_F9fhYmDwBmd=ucUDQX_9416gzKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-05 15:57                       ` Daniel Jurgens
     [not found]                         ` <d15d980b-1d67-277e-60f8-0b7e92035a06-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-05 20:44                           ` Or Gerlitz
2018-01-09  2:51   ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.