All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
@ 2017-12-24 12:57 Leon Romanovsky
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
       [not found] ` <CAJ3xEMhZgEee+VLpV4bV150siOdXwpcp64AGqeqr5Y2o--WRdw@mail.gmail.com>
  0 siblings, 2 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

>From Daniel:

This feature allows RDMA resources (pd, mr, cq, qp, etc) to be used with
both physical ports of capable mlx5 devices. When enabled a single IB
device with two ports will be registered instead of two single port
devices.

There are still two PCI devices underlying the two port device, the
capabilities indicate which device is the "master" device and which is
the slave.

When the add callback function is called for a slave device a list of IB
devices is searched for matching master device, indicated by the capabilities
and the system_image_guid. If a match is found the slave is bound to the
master device, otherwise it's placed on a list, in case it's master becomes
available in the future. When a master device is added it searches the list
of available slaves for a matching slave device. If a match is found it binds
the slave as its 2nd port. If no match as found the device still appears
as a dual port device, with the 2nd port down. RDMA resources can still
created that use the yet unavailable 2nd port.

Commands related to IB resources are all routed through the master mlx5_core
device. Port specific commands, like those for hardware counters are routed to
their respective port mlx5_core device. Since devices can appear and disappear
asynchronously a reference count on the underlying mlx5_core device is
maintained. Getting and putting this reference is only necessary for commands
destined to a specific port, the master core device can be used freely,
as it will exist while the IB device exists.

SR-IOV devices follow the same pattern as the physical ones. VFs of a master
port can bind VFs of slave ports, if available, and operate as dual port
devices.

The patches are available in the git repository at:
  git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git tags/rdma-next-2017-12-24-2

	Thanks
---------------------------------------

Daniel Jurgens (14):
  net/mlx5: Fix race for multiple RoCE enable
  net/mlx5: Set software owner ID during init HCA
  IB/core: Change roce_rescan_device to return void
  IB/mlx5: Reduce the use of num_port capability
  IB/mlx5: Make netdev notifications multiport capable
  {net,IB}/mlx5: Manage port association for multiport RoCE
  IB/mlx5: Move IB event processing onto a workqueue
  IB/mlx5: Implement dual port functionality in query routines
  IB/mlx5: Update counter implementation for dual port RoCE
  {net,IB}/mlx5: Change set_roce_gid to take a port number
  IB/mlx5: Route MADs for dual port RoCE
  IB/mlx5: Use correct mdev for vport queries in ib_virt
  IB/mlx5: Don't advertise RAW QP support in dual port mode
  net/mlx5: Set num_vhca_ports capability

Parav Pandit (1):
  IB/mlx5: Change debugfs to have per port contents

 drivers/infiniband/core/cache.c                    |   7 +-
 drivers/infiniband/core/core_priv.h                |   1 -
 drivers/infiniband/core/roce_gid_mgmt.c            |  13 +-
 drivers/infiniband/hw/mlx5/cong.c                  |  83 ++-
 drivers/infiniband/hw/mlx5/ib_virt.c               |  84 ++-
 drivers/infiniband/hw/mlx5/mad.c                   |  23 +-
 drivers/infiniband/hw/mlx5/main.c                  | 791 +++++++++++++++++----
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |  41 +-
 drivers/infiniband/hw/mlx5/qp.c                    |   8 +-
 .../net/ethernet/mellanox/mlx5/core/fpga/conn.c    |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c       |  10 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c  |   5 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  12 +-
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h    |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |  91 ++-
 include/linux/mlx5/device.h                        |   5 +
 include/linux/mlx5/driver.h                        |  29 +-
 include/linux/mlx5/mlx5_ifc.h                      |  32 +-
 include/linux/mlx5/vport.h                         |   4 +
 include/rdma/ib_verbs.h                            |   8 +
 21 files changed, 1037 insertions(+), 225 deletions(-)

--
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 01/15] net/mlx5: Fix race for multiple RoCE enable
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-12-24 12:57   ` Leon Romanovsky
       [not found]     ` <20171224125741.25464-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-12-24 12:57   ` [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA Leon Romanovsky
                     ` (14 subsequent siblings)
  15 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

There are two potential problems with the existing implementation.

1. Enable and disable can race after the atomic operations,
2. If a command fails the refcount is left in an inconsistent state.

Introduce a lock and perform error checking.

Fixes: a6f7d2aff623 ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/vport.c | 33 ++++++++++++++++++++-----
 include/linux/mlx5/driver.h                     |  2 +-
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index d653b0025b13..916523103f16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -36,6 +36,9 @@
 #include <linux/mlx5/vport.h>
 #include "mlx5_core.h"
 
+/* Mutex to hold while enabling or disabling RoCE */
+static DEFINE_MUTEX(mlx5_roce_en_lock);
+
 static int _mlx5_query_vport_state(struct mlx5_core_dev *mdev, u8 opmod,
 				   u16 vport, u32 *out, int outlen)
 {
@@ -988,17 +991,35 @@ static int mlx5_nic_vport_update_roce_state(struct mlx5_core_dev *mdev,
 
 int mlx5_nic_vport_enable_roce(struct mlx5_core_dev *mdev)
 {
-	if (atomic_inc_return(&mdev->roce.roce_en) != 1)
-		return 0;
-	return mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_ENABLED);
+	int err = 0;
+
+	mutex_lock(&mlx5_roce_en_lock);
+	if (!mdev->roce.roce_en)
+		err = mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_ENABLED);
+
+	if (!err)
+		mdev->roce.roce_en++;
+	mutex_unlock(&mlx5_roce_en_lock);
+
+	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_nic_vport_enable_roce);
 
 int mlx5_nic_vport_disable_roce(struct mlx5_core_dev *mdev)
 {
-	if (atomic_dec_return(&mdev->roce.roce_en) != 0)
-		return 0;
-	return mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_DISABLED);
+	int err = 0;
+
+	mutex_lock(&mlx5_roce_en_lock);
+	if (mdev->roce.roce_en) {
+		mdev->roce.roce_en--;
+		if (mdev->roce.roce_en == 0)
+			err = mlx5_nic_vport_update_roce_state(mdev, MLX5_VPORT_ROCE_DISABLED);
+
+		if (err)
+			mdev->roce.roce_en++;
+	}
+	mutex_unlock(&mlx5_roce_en_lock);
+	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_nic_vport_disable_roce);
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index a886b51511ab..8becaa54aeea 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -824,7 +824,7 @@ struct mlx5_core_dev {
 	struct mlx5e_resources  mlx5e_res;
 	struct {
 		struct mlx5_rsvd_gids	reserved_gids;
-		atomic_t                roce_en;
+		u32			roce_en;
 	} roce;
 #ifdef CONFIG_MLX5_FPGA
 	struct mlx5_fpga_device *fpga;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-12-24 12:57   ` [PATCH rdma-next 01/15] net/mlx5: Fix race for multiple RoCE enable Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
       [not found]     ` <20171224125741.25464-3-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-12-24 12:57   ` [PATCH rdma-next 03/15] IB/core: Change roce_rescan_device to return void Leon Romanovsky
                     ` (13 subsequent siblings)
  15 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Generate a unique 128bit identifier for each host and pass that value to
firmware in the INIT_HCA command if it reports the sw_owner_id
capo ability. This value is used by FW to determine if functions are in
use by the same host.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c        | 10 +++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c      |  6 +++++-
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h |  2 +-
 include/linux/mlx5/device.h                         |  5 +++++
 include/linux/mlx5/mlx5_ifc.h                       |  5 ++++-
 5 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 5ef1b56b6a96..9d11e92fb541 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -195,12 +195,20 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 	return 0;
 }
 
-int mlx5_cmd_init_hca(struct mlx5_core_dev *dev)
+int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id)
 {
 	u32 out[MLX5_ST_SZ_DW(init_hca_out)] = {0};
 	u32 in[MLX5_ST_SZ_DW(init_hca_in)]   = {0};
+	int i;
 
 	MLX5_SET(init_hca_in, in, opcode, MLX5_CMD_OP_INIT_HCA);
+
+	if (MLX5_CAP_GEN(dev, sw_owner_id)) {
+		for (i = 0; i < 4; i++)
+			MLX5_ARRAY_SET(init_hca_in, in, sw_owner_id, i,
+				       sw_owner_id[i]);
+	}
+
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 5f323442cc5a..5f3dc0ede917 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -75,6 +75,8 @@ static unsigned int prof_sel = MLX5_DEFAULT_PROF;
 module_param_named(prof_sel, prof_sel, uint, 0444);
 MODULE_PARM_DESC(prof_sel, "profile selector. Valid range 0 - 2");
 
+static u32 sw_owner_id[4];
+
 enum {
 	MLX5_ATOMIC_REQ_MODE_BE = 0x0,
 	MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS = 0x1,
@@ -1052,7 +1054,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 		goto reclaim_boot_pages;
 	}
 
-	err = mlx5_cmd_init_hca(dev);
+	err = mlx5_cmd_init_hca(dev, sw_owner_id);
 	if (err) {
 		dev_err(&pdev->dev, "init hca failed\n");
 		goto err_pagealloc_stop;
@@ -1574,6 +1576,8 @@ static int __init init(void)
 {
 	int err;
 
+	get_random_bytes(&sw_owner_id, sizeof(sw_owner_id));
+
 	mlx5_core_verify_params();
 	mlx5_register_debugfs();
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index ff4a0b889a6f..b05868728da7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -86,7 +86,7 @@ enum {
 
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
 int mlx5_query_board_id(struct mlx5_core_dev *dev);
-int mlx5_cmd_init_hca(struct mlx5_core_dev *dev);
+int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id);
 int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev);
 int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev);
 void mlx5_core_event(struct mlx5_core_dev *dev, enum mlx5_dev_event event,
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 409ffb14298a..18c041966ab8 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -79,6 +79,11 @@
 		     << __mlx5_dw_bit_off(typ, fld))); \
 } while (0)
 
+#define MLX5_ARRAY_SET(typ, p, fld, idx, v) do { \
+	BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 32); \
+	MLX5_SET(typ, p, fld[idx], v); \
+} while (0)
+
 #define MLX5_SET_TO_ONES(typ, p, fld) do { \
 	BUILD_BUG_ON(__mlx5_st_sz_bits(typ) % 32);             \
 	*((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 38a7577a9ce7..b1c81d7a86cb 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1066,7 +1066,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_5f8[0x3];
 	u8         log_max_xrq[0x5];
 
-	u8         reserved_at_600[0x200];
+	u8         reserved_at_600[0x1e];
+	u8	   sw_owner_id;
+	u8	   reserved_at_61f[0x1e1];
 };
 
 enum mlx5_flow_destination_type {
@@ -5531,6 +5533,7 @@ struct mlx5_ifc_init_hca_in_bits {
 	u8         op_mod[0x10];
 
 	u8         reserved_at_40[0x40];
+	u8	   sw_owner_id[4][0x20];
 };
 
 struct mlx5_ifc_init2rtr_qp_out_bits {
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 03/15] IB/core: Change roce_rescan_device to return void
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-12-24 12:57   ` [PATCH rdma-next 01/15] net/mlx5: Fix race for multiple RoCE enable Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 04/15] IB/mlx5: Reduce the use of num_port capability Leon Romanovsky
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

It always returns 0. Change return type to void.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/cache.c         | 7 +------
 drivers/infiniband/core/core_priv.h     | 2 +-
 drivers/infiniband/core/roce_gid_mgmt.c | 4 +---
 3 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 7babdbceb6d0..fc4022884dbb 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -821,12 +821,7 @@ static int gid_table_setup_one(struct ib_device *ib_dev)
 	if (err)
 		return err;
 
-	err = roce_rescan_device(ib_dev);
-
-	if (err) {
-		gid_table_cleanup_one(ib_dev);
-		gid_table_release_one(ib_dev);
-	}
+	roce_rescan_device(ib_dev);
 
 	return err;
 }
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 6c4541af54bb..6fab8bf0a2bc 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -137,7 +137,7 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
 int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
 
-int roce_rescan_device(struct ib_device *ib_dev);
+void roce_rescan_device(struct ib_device *ib_dev);
 unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
 
 int ib_cache_setup_one(struct ib_device *device);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 90e3889b7fbe..ebfe45739ca7 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -412,12 +412,10 @@ static void enum_all_gids_of_dev_cb(struct ib_device *ib_dev,
 
 /* This function will rescan all of the network devices in the system
  * and add their gids, as needed, to the relevant RoCE devices. */
-int roce_rescan_device(struct ib_device *ib_dev)
+void roce_rescan_device(struct ib_device *ib_dev)
 {
 	ib_enum_roce_netdev(ib_dev, pass_all_filter, NULL,
 			    enum_all_gids_of_dev_cb, NULL);
-
-	return 0;
 }
 
 static void callback_for_addr_gid_device_scan(struct ib_device *device,
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 04/15] IB/mlx5: Reduce the use of num_port capability
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 03/15] IB/core: Change roce_rescan_device to return void Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 05/15] IB/mlx5: Make netdev notifications multiport capable Leon Romanovsky
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Remove use of the num_ports general capability throughout. The number of
ports will be variable in the future, and reported in a different way.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mad.c  |  2 +-
 drivers/infiniband/hw/mlx5/main.c | 16 ++++++++--------
 drivers/infiniband/hw/mlx5/qp.c   |  5 ++---
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 1003b0133a49..0559e0a9e398 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -519,7 +519,7 @@ int mlx5_query_mad_ifc_port(struct ib_device *ibdev, u8 port,
 	int ext_active_speed;
 	int err = -ENOMEM;
 
-	if (port < 1 || port > MLX5_CAP_GEN(mdev, num_ports)) {
+	if (port < 1 || port > dev->num_ports) {
 		mlx5_ib_warn(dev, "invalid port number %d\n", port);
 		return -EINVAL;
 	}
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 96cc51455b09..a28362264d1b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1468,7 +1468,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 	mutex_init(&context->db_page_mutex);
 
 	resp.tot_bfregs = req.total_num_bfregs;
-	resp.num_ports = MLX5_CAP_GEN(dev->mdev, num_ports);
+	resp.num_ports = dev->num_ports;
 
 	if (field_avail(typeof(resp), cqe_version, udata->outlen))
 		resp.response_length += sizeof(resp.cqe_version);
@@ -2654,7 +2654,7 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
 		return ERR_PTR(-ENOMEM);
 
 	if (domain != IB_FLOW_DOMAIN_USER ||
-	    flow_attr->port > MLX5_CAP_GEN(dev->mdev, num_ports) ||
+	    flow_attr->port > dev->num_ports ||
 	    (flow_attr->flags & ~IB_FLOW_ATTR_FLAGS_DONT_TRAP))
 		return ERR_PTR(-EINVAL);
 
@@ -3002,7 +3002,7 @@ static int set_has_smi_cap(struct mlx5_ib_dev *dev)
 	int err;
 	int port;
 
-	for (port = 1; port <= MLX5_CAP_GEN(dev->mdev, num_ports); port++) {
+	for (port = 1; port <= dev->num_ports; port++) {
 		dev->mdev->port_caps[port - 1].has_smi = false;
 		if (MLX5_CAP_GEN(dev->mdev, port_type) ==
 		    MLX5_CAP_PORT_TYPE_IB) {
@@ -3029,7 +3029,7 @@ static void get_ext_port_caps(struct mlx5_ib_dev *dev)
 {
 	int port;
 
-	for (port = 1; port <= MLX5_CAP_GEN(dev->mdev, num_ports); port++)
+	for (port = 1; port <= dev->num_ports; port++)
 		mlx5_query_ext_port_caps(dev, port);
 }
 
@@ -3059,7 +3059,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
 		goto out;
 	}
 
-	for (port = 1; port <= MLX5_CAP_GEN(dev->mdev, num_ports); port++) {
+	for (port = 1; port <= dev->num_ports; port++) {
 		memset(pprops, 0, sizeof(*pprops));
 		err = mlx5_ib_query_port(&dev->ib_dev, port, pprops);
 		if (err) {
@@ -3961,7 +3961,7 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	const char *name;
 	int err;
 
-	dev->port = kcalloc(MLX5_CAP_GEN(mdev, num_ports), sizeof(*dev->port),
+	dev->port = kcalloc(dev->num_ports, sizeof(*dev->port),
 			    GFP_KERNEL);
 	if (!dev->port)
 		return -ENOMEM;
@@ -3983,8 +3983,7 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	dev->ib_dev.owner		= THIS_MODULE;
 	dev->ib_dev.node_type		= RDMA_NODE_IB_CA;
 	dev->ib_dev.local_dma_lkey	= 0 /* not supported for now */;
-	dev->num_ports		= MLX5_CAP_GEN(mdev, num_ports);
-	dev->ib_dev.phys_port_cnt     = dev->num_ports;
+	dev->ib_dev.phys_port_cnt	= dev->num_ports;
 	dev->ib_dev.num_comp_vectors    =
 		dev->mdev->priv.eq_table.num_comp_vectors;
 	dev->ib_dev.dev.parent		= &mdev->pdev->dev;
@@ -4342,6 +4341,7 @@ static void *__mlx5_ib_add(struct mlx5_core_dev *mdev,
 		return NULL;
 
 	dev->mdev = mdev;
+	dev->num_ports = MLX5_CAP_GEN(mdev, num_ports);
 
 	for (i = 0; i < MLX5_IB_STAGE_MAX; i++) {
 		if (profile->stage[i].init) {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 31ad28853efa..2b5cd7bd58d1 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3039,7 +3039,7 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 
 	if ((attr_mask & IB_QP_PORT) &&
 	    (attr->port_num == 0 ||
-	     attr->port_num > MLX5_CAP_GEN(dev->mdev, num_ports))) {
+	     attr->port_num > dev->num_ports)) {
 		mlx5_ib_dbg(dev, "invalid port number %d. number of ports is %d\n",
 			    attr->port_num, dev->num_ports);
 		goto out;
@@ -4358,14 +4358,13 @@ static void to_rdma_ah_attr(struct mlx5_ib_dev *ibdev,
 			    struct rdma_ah_attr *ah_attr,
 			    struct mlx5_qp_path *path)
 {
-	struct mlx5_core_dev *dev = ibdev->mdev;
 
 	memset(ah_attr, 0, sizeof(*ah_attr));
 
 	ah_attr->type = rdma_ah_find_type(&ibdev->ib_dev, path->port);
 	rdma_ah_set_port_num(ah_attr, path->port);
 	if (rdma_ah_get_port_num(ah_attr) == 0 ||
-	    rdma_ah_get_port_num(ah_attr) > MLX5_CAP_GEN(dev, num_ports))
+	    rdma_ah_get_port_num(ah_attr) > ibdev->num_ports)
 		return;
 
 	rdma_ah_set_port_num(ah_attr, path->port);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 05/15] IB/mlx5: Make netdev notifications multiport capable
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 04/15] IB/mlx5: Reduce the use of num_port capability Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 06/15] {net,IB}/mlx5: Manage port association for multiport RoCE Leon Romanovsky
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When multiple RoCE ports are supported registration for events on
multiple netdevs is required. Refactor the event registration and
handling to support multiple ports.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 85 +++++++++++++++++++++---------------
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  4 +-
 drivers/infiniband/hw/mlx5/qp.c      |  3 +-
 include/linux/mlx5/driver.h          |  5 +++
 4 files changed, 60 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a28362264d1b..a6dea2a8a455 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -115,24 +115,30 @@ static int get_port_state(struct ib_device *ibdev,
 static int mlx5_netdev_event(struct notifier_block *this,
 			     unsigned long event, void *ptr)
 {
+	struct mlx5_roce *roce = container_of(this, struct mlx5_roce, nb);
 	struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
-	struct mlx5_ib_dev *ibdev = container_of(this, struct mlx5_ib_dev,
-						 roce.nb);
+	u8 port_num = roce->native_port_num;
+	struct mlx5_core_dev *mdev;
+	struct mlx5_ib_dev *ibdev;
+
+	ibdev = roce->dev;
+	mdev = ibdev->mdev;
 
 	switch (event) {
 	case NETDEV_REGISTER:
 	case NETDEV_UNREGISTER:
-		write_lock(&ibdev->roce.netdev_lock);
-		if (ndev->dev.parent == &ibdev->mdev->pdev->dev)
-			ibdev->roce.netdev = (event == NETDEV_UNREGISTER) ?
-					     NULL : ndev;
-		write_unlock(&ibdev->roce.netdev_lock);
+		write_lock(&roce->netdev_lock);
+
+		if (ndev->dev.parent == &mdev->pdev->dev)
+			roce->netdev = (event == NETDEV_UNREGISTER) ?
+					NULL : ndev;
+		write_unlock(&roce->netdev_lock);
 		break;
 
 	case NETDEV_CHANGE:
 	case NETDEV_UP:
 	case NETDEV_DOWN: {
-		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(ibdev->mdev);
+		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(mdev);
 		struct net_device *upper = NULL;
 
 		if (lag_ndev) {
@@ -140,27 +146,28 @@ static int mlx5_netdev_event(struct notifier_block *this,
 			dev_put(lag_ndev);
 		}
 
-		if ((upper == ndev || (!upper && ndev == ibdev->roce.netdev))
+		if ((upper == ndev || (!upper && ndev == roce->netdev))
 		    && ibdev->ib_active) {
 			struct ib_event ibev = { };
 			enum ib_port_state port_state;
 
-			if (get_port_state(&ibdev->ib_dev, 1, &port_state))
-				return NOTIFY_DONE;
+			if (get_port_state(&ibdev->ib_dev, port_num,
+					   &port_state))
+				goto done;
 
-			if (ibdev->roce.last_port_state == port_state)
-				return NOTIFY_DONE;
+			if (roce->last_port_state == port_state)
+				goto done;
 
-			ibdev->roce.last_port_state = port_state;
+			roce->last_port_state = port_state;
 			ibev.device = &ibdev->ib_dev;
 			if (port_state == IB_PORT_DOWN)
 				ibev.event = IB_EVENT_PORT_ERR;
 			else if (port_state == IB_PORT_ACTIVE)
 				ibev.event = IB_EVENT_PORT_ACTIVE;
 			else
-				return NOTIFY_DONE;
+				goto done;
 
-			ibev.element.port_num = 1;
+			ibev.element.port_num = port_num;
 			ib_dispatch_event(&ibev);
 		}
 		break;
@@ -169,7 +176,7 @@ static int mlx5_netdev_event(struct notifier_block *this,
 	default:
 		break;
 	}
-
+done:
 	return NOTIFY_DONE;
 }
 
@@ -185,11 +192,11 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 
 	/* Ensure ndev does not disappear before we invoke dev_hold()
 	 */
-	read_lock(&ibdev->roce.netdev_lock);
-	ndev = ibdev->roce.netdev;
+	read_lock(&ibdev->roce[port_num - 1].netdev_lock);
+	ndev = ibdev->roce[port_num - 1].netdev;
 	if (ndev)
 		dev_hold(ndev);
-	read_unlock(&ibdev->roce.netdev_lock);
+	read_unlock(&ibdev->roce[port_num - 1].netdev_lock);
 
 	return ndev;
 }
@@ -3459,33 +3466,33 @@ static void mlx5_eth_lag_cleanup(struct mlx5_ib_dev *dev)
 	}
 }
 
-static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev)
+static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	int err;
 
-	dev->roce.nb.notifier_call = mlx5_netdev_event;
-	err = register_netdevice_notifier(&dev->roce.nb);
+	dev->roce[port_num].nb.notifier_call = mlx5_netdev_event;
+	err = register_netdevice_notifier(&dev->roce[port_num].nb);
 	if (err) {
-		dev->roce.nb.notifier_call = NULL;
+		dev->roce[port_num].nb.notifier_call = NULL;
 		return err;
 	}
 
 	return 0;
 }
 
-static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev)
+static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)
 {
-	if (dev->roce.nb.notifier_call) {
-		unregister_netdevice_notifier(&dev->roce.nb);
-		dev->roce.nb.notifier_call = NULL;
+	if (dev->roce[port_num].nb.notifier_call) {
+		unregister_netdevice_notifier(&dev->roce[port_num].nb);
+		dev->roce[port_num].nb.notifier_call = NULL;
 	}
 }
 
-static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
+static int mlx5_enable_eth(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	int err;
 
-	err = mlx5_add_netdev_notifier(dev);
+	err = mlx5_add_netdev_notifier(dev, port_num);
 	if (err)
 		return err;
 
@@ -3506,7 +3513,7 @@ static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
 		mlx5_nic_vport_disable_roce(dev->mdev);
 
 err_unregister_netdevice_notifier:
-	mlx5_remove_netdev_notifier(dev);
+	mlx5_remove_netdev_notifier(dev, port_num);
 	return err;
 }
 
@@ -3966,7 +3973,6 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	if (!dev->port)
 		return -ENOMEM;
 
-	rwlock_init(&dev->roce.netdev_lock);
 	err = get_port_caps(dev);
 	if (err)
 		goto err_free_port;
@@ -4140,12 +4146,21 @@ static int mlx5_ib_stage_roce_init(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
+	u8 port_num = 0;
 	int err;
+	int i;
 
 	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
 	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
 	if (ll == IB_LINK_LAYER_ETHERNET) {
+		for (i = 0; i < dev->num_ports; i++) {
+			rwlock_init(&dev->roce[i].netdev_lock);
+			dev->roce[i].dev = dev;
+			dev->roce[i].native_port_num = i + 1;
+			dev->roce[i].last_port_state = IB_PORT_DOWN;
+		}
+
 		dev->ib_dev.get_netdev	= mlx5_ib_get_netdev;
 		dev->ib_dev.create_wq	 = mlx5_ib_create_wq;
 		dev->ib_dev.modify_wq	 = mlx5_ib_modify_wq;
@@ -4158,10 +4173,9 @@ static int mlx5_ib_stage_roce_init(struct mlx5_ib_dev *dev)
 			(1ull << IB_USER_VERBS_EX_CMD_DESTROY_WQ) |
 			(1ull << IB_USER_VERBS_EX_CMD_CREATE_RWQ_IND_TBL) |
 			(1ull << IB_USER_VERBS_EX_CMD_DESTROY_RWQ_IND_TBL);
-		err = mlx5_enable_eth(dev);
+		err = mlx5_enable_eth(dev, port_num);
 		if (err)
 			return err;
-		dev->roce.last_port_state = IB_PORT_DOWN;
 	}
 
 	return 0;
@@ -4172,13 +4186,14 @@ static void mlx5_ib_stage_roce_cleanup(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
+	u8 port_num = 0;
 
 	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
 	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
 	if (ll == IB_LINK_LAYER_ETHERNET) {
 		mlx5_disable_eth(dev);
-		mlx5_remove_netdev_notifier(dev);
+		mlx5_remove_netdev_notifier(dev, port_num);
 	}
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 8f762ac4a659..1107047c6f83 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -647,6 +647,8 @@ struct mlx5_roce {
 	struct notifier_block	nb;
 	atomic_t		next_port;
 	enum ib_port_state last_port_state;
+	struct mlx5_ib_dev	*dev;
+	u8			native_port_num;
 };
 
 struct mlx5_ib_dbg_param {
@@ -749,7 +751,7 @@ struct mlx5_ib_odp {
 struct mlx5_ib_dev {
 	struct ib_device		ib_dev;
 	struct mlx5_core_dev		*mdev;
-	struct mlx5_roce		roce;
+	struct mlx5_roce		roce[MLX5_MAX_PORTS];
 	int				num_ports;
 	/* serialize update of capability mask
 	 */
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 2b5cd7bd58d1..33b132a9a0fc 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -2796,8 +2796,9 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 		    (ibqp->qp_type == IB_QPT_XRC_INI) ||
 		    (ibqp->qp_type == IB_QPT_XRC_TGT)) {
 			if (mlx5_lag_is_active(dev->mdev)) {
+				u8 p = mlx5_core_native_port_num(dev->mdev);
 				tx_affinity = (unsigned int)atomic_add_return(1,
-						&dev->roce.next_port) %
+						&dev->roce[p].next_port) %
 						MLX5_MAX_PORTS + 1;
 				context->flags |= cpu_to_be32(tx_affinity << 24);
 			}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 8becaa54aeea..cfcb91975323 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1219,6 +1219,11 @@ static inline bool mlx5_rl_is_supported(struct mlx5_core_dev *dev)
 	return !!(dev->priv.rl_table.max_size);
 }
 
+static inline int mlx5_core_native_port_num(struct mlx5_core_dev *dev)
+{
+	return 1;
+}
+
 enum {
 	MLX5_TRIGGERED_CMD_COMP = (u64)1 << 32,
 };
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 06/15] {net,IB}/mlx5: Manage port association for multiport RoCE
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 05/15] IB/mlx5: Make netdev notifications multiport capable Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 07/15] IB/mlx5: Move IB event processing onto a workqueue Leon Romanovsky
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/cache.c                    |   2 +-
 drivers/infiniband/core/core_priv.h                |   1 -
 drivers/infiniband/core/roce_gid_mgmt.c            |  11 +-
 drivers/infiniband/hw/mlx5/main.c                  | 421 +++++++++++++++++++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h               |  29 ++
 .../net/ethernet/mellanox/mlx5/core/fpga/conn.c    |   4 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    |  58 +++
 include/linux/mlx5/driver.h                        |  22 +-
 include/linux/mlx5/mlx5_ifc.h                      |  31 +-
 include/linux/mlx5/vport.h                         |   4 +
 include/rdma/ib_verbs.h                            |   8 +
 12 files changed, 551 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index fc4022884dbb..e9a409d7f4e2 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -821,7 +821,7 @@ static int gid_table_setup_one(struct ib_device *ib_dev)
 	if (err)
 		return err;
 
-	roce_rescan_device(ib_dev);
+	rdma_roce_rescan_device(ib_dev);
 
 	return err;
 }
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 6fab8bf0a2bc..ded3850721e0 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -137,7 +137,6 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u8 port,
 int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
 
-void roce_rescan_device(struct ib_device *ib_dev);
 unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
 
 int ib_cache_setup_one(struct ib_device *device);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index ebfe45739ca7..5a52ec77940a 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -410,13 +410,18 @@ static void enum_all_gids_of_dev_cb(struct ib_device *ib_dev,
 	rtnl_unlock();
 }
 
-/* This function will rescan all of the network devices in the system
- * and add their gids, as needed, to the relevant RoCE devices. */
-void roce_rescan_device(struct ib_device *ib_dev)
+/**
+ * rdma_roce_rescan_device - Rescan all of the network devices in the system
+ * and add their gids, as needed, to the relevant RoCE devices.
+ *
+ * @device:         the rdma device
+ */
+void rdma_roce_rescan_device(struct ib_device *ib_dev)
 {
 	ib_enum_roce_netdev(ib_dev, pass_all_filter, NULL,
 			    enum_all_gids_of_dev_cb, NULL);
 }
+EXPORT_SYMBOL(rdma_roce_rescan_device);
 
 static void callback_for_addr_gid_device_scan(struct ib_device *device,
 					      u8 port,
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a6dea2a8a455..745c748a79a5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -76,6 +76,23 @@ enum {
 	MLX5_ATOMIC_SIZE_QP_8BYTES = 1 << 3,
 };
 
+static LIST_HEAD(mlx5_ib_unaffiliated_port_list);
+static LIST_HEAD(mlx5_ib_dev_list);
+/*
+ * This mutex should be held when accessing either of the above lists
+ */
+static DEFINE_MUTEX(mlx5_ib_multiport_mutex);
+
+struct mlx5_ib_dev *mlx5_ib_get_ibdev_from_mpi(struct mlx5_ib_multiport_info *mpi)
+{
+	struct mlx5_ib_dev *dev;
+
+	mutex_lock(&mlx5_ib_multiport_mutex);
+	dev = mpi->ibdev;
+	mutex_unlock(&mlx5_ib_multiport_mutex);
+	return dev;
+}
+
 static enum rdma_link_layer
 mlx5_port_type_cap_to_rdma_ll(int port_type_cap)
 {
@@ -122,7 +139,9 @@ static int mlx5_netdev_event(struct notifier_block *this,
 	struct mlx5_ib_dev *ibdev;
 
 	ibdev = roce->dev;
-	mdev = ibdev->mdev;
+	mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL);
+	if (!mdev)
+		return NOTIFY_DONE;
 
 	switch (event) {
 	case NETDEV_REGISTER:
@@ -177,6 +196,7 @@ static int mlx5_netdev_event(struct notifier_block *this,
 		break;
 	}
 done:
+	mlx5_ib_put_native_port_mdev(ibdev, port_num);
 	return NOTIFY_DONE;
 }
 
@@ -185,10 +205,15 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 {
 	struct mlx5_ib_dev *ibdev = to_mdev(device);
 	struct net_device *ndev;
+	struct mlx5_core_dev *mdev;
+
+	mdev = mlx5_ib_get_native_port_mdev(ibdev, port_num, NULL);
+	if (!mdev)
+		return NULL;
 
-	ndev = mlx5_lag_get_roce_netdev(ibdev->mdev);
+	ndev = mlx5_lag_get_roce_netdev(mdev);
 	if (ndev)
-		return ndev;
+		goto out;
 
 	/* Ensure ndev does not disappear before we invoke dev_hold()
 	 */
@@ -198,9 +223,70 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 		dev_hold(ndev);
 	read_unlock(&ibdev->roce[port_num - 1].netdev_lock);
 
+out:
+	mlx5_ib_put_native_port_mdev(ibdev, port_num);
 	return ndev;
 }
 
+struct mlx5_core_dev *mlx5_ib_get_native_port_mdev(struct mlx5_ib_dev *ibdev,
+						   u8 ib_port_num,
+						   u8 *native_port_num)
+{
+	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&ibdev->ib_dev,
+							  ib_port_num);
+	struct mlx5_core_dev *mdev = NULL;
+	struct mlx5_ib_multiport_info *mpi;
+	struct mlx5_ib_port *port;
+
+	if (native_port_num)
+		*native_port_num = 1;
+
+	if (!mlx5_core_mp_enabled(ibdev->mdev) || ll != IB_LINK_LAYER_ETHERNET)
+		return ibdev->mdev;
+
+	port = &ibdev->port[ib_port_num - 1];
+	if (!port)
+		return NULL;
+
+	spin_lock(&port->mp.mpi_lock);
+	mpi = ibdev->port[ib_port_num - 1].mp.mpi;
+	if (mpi && !mpi->unaffiliate) {
+		mdev = mpi->mdev;
+		/* If it's the master no need to refcount, it'll exist
+		 * as long as the ib_dev exists.
+		 */
+		if (!mpi->is_master)
+			mpi->mdev_refcnt++;
+	}
+	spin_unlock(&port->mp.mpi_lock);
+
+	return mdev;
+}
+
+void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *ibdev, u8 port_num)
+{
+	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&ibdev->ib_dev,
+							  port_num);
+	struct mlx5_ib_multiport_info *mpi;
+	struct mlx5_ib_port *port;
+
+	if (!mlx5_core_mp_enabled(ibdev->mdev) || ll != IB_LINK_LAYER_ETHERNET)
+		return;
+
+	port = &ibdev->port[port_num - 1];
+
+	spin_lock(&port->mp.mpi_lock);
+	mpi = ibdev->port[port_num - 1].mp.mpi;
+	if (mpi->is_master)
+		goto out;
+
+	mpi->mdev_refcnt--;
+	if (mpi->unaffiliate)
+		complete(&mpi->unref_comp);
+out:
+	spin_unlock(&port->mp.mpi_lock);
+}
+
 static int translate_eth_proto_oper(u32 eth_proto_oper, u8 *active_speed,
 				    u8 *active_width)
 {
@@ -3040,12 +3126,11 @@ static void get_ext_port_caps(struct mlx5_ib_dev *dev)
 		mlx5_query_ext_port_caps(dev, port);
 }
 
-static int get_port_caps(struct mlx5_ib_dev *dev)
+static int get_port_caps(struct mlx5_ib_dev *dev, u8 port)
 {
 	struct ib_device_attr *dprops = NULL;
 	struct ib_port_attr *pprops = NULL;
 	int err = -ENOMEM;
-	int port;
 	struct ib_udata uhw = {.inlen = 0, .outlen = 0};
 
 	pprops = kmalloc(sizeof(*pprops), GFP_KERNEL);
@@ -3066,22 +3151,21 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
 		goto out;
 	}
 
-	for (port = 1; port <= dev->num_ports; port++) {
-		memset(pprops, 0, sizeof(*pprops));
-		err = mlx5_ib_query_port(&dev->ib_dev, port, pprops);
-		if (err) {
-			mlx5_ib_warn(dev, "query_port %d failed %d\n",
-				     port, err);
-			break;
-		}
-		dev->mdev->port_caps[port - 1].pkey_table_len =
-						dprops->max_pkeys;
-		dev->mdev->port_caps[port - 1].gid_table_len =
-						pprops->gid_tbl_len;
-		mlx5_ib_dbg(dev, "pkey_table_len %d, gid_table_len %d\n",
-			    dprops->max_pkeys, pprops->gid_tbl_len);
+	memset(pprops, 0, sizeof(*pprops));
+	err = mlx5_ib_query_port(&dev->ib_dev, port, pprops);
+	if (err) {
+		mlx5_ib_warn(dev, "query_port %d failed %d\n",
+			     port, err);
+		goto out;
 	}
 
+	dev->mdev->port_caps[port - 1].pkey_table_len =
+					dprops->max_pkeys;
+	dev->mdev->port_caps[port - 1].gid_table_len =
+					pprops->gid_tbl_len;
+	mlx5_ib_dbg(dev, "port %d: pkey_table_len %d, gid_table_len %d\n",
+		    port, dprops->max_pkeys, pprops->gid_tbl_len);
+
 out:
 	kfree(pprops);
 	kfree(dprops);
@@ -3957,8 +4041,203 @@ mlx5_ib_get_vector_affinity(struct ib_device *ibdev, int comp_vector)
 	return mlx5_get_vector_affinity(dev->mdev, comp_vector);
 }
 
+/* The mlx5_ib_multiport_mutex should be held when calling this function */
+static void mlx5_ib_unbind_slave_port(struct mlx5_ib_dev *ibdev,
+				      struct mlx5_ib_multiport_info *mpi)
+{
+	u8 port_num = mlx5_core_native_port_num(mpi->mdev) - 1;
+	struct mlx5_ib_port *port = &ibdev->port[port_num];
+	int comps;
+	int err;
+	int i;
+
+	spin_lock(&port->mp.mpi_lock);
+	if (!mpi->ibdev) {
+		spin_unlock(&port->mp.mpi_lock);
+		return;
+	}
+	mpi->ibdev = NULL;
+
+	spin_unlock(&port->mp.mpi_lock);
+	mlx5_remove_netdev_notifier(ibdev, port_num);
+	spin_lock(&port->mp.mpi_lock);
+
+	comps = mpi->mdev_refcnt;
+	if (comps) {
+		mpi->unaffiliate = true;
+		init_completion(&mpi->unref_comp);
+		spin_unlock(&port->mp.mpi_lock);
+
+		for (i = 0; i < comps; i++)
+			wait_for_completion(&mpi->unref_comp);
+
+		spin_lock(&port->mp.mpi_lock);
+		mpi->unaffiliate = false;
+	}
+
+	port->mp.mpi = NULL;
+
+	list_add_tail(&mpi->list, &mlx5_ib_unaffiliated_port_list);
+
+	spin_unlock(&port->mp.mpi_lock);
+
+	err = mlx5_nic_vport_unaffiliate_multiport(mpi->mdev);
+
+	mlx5_ib_dbg(ibdev, "unaffiliated port %d\n", port_num + 1);
+	/* Log an error, still needed to cleanup the pointers and add
+	 * it back to the list.
+	 */
+	if (err)
+		mlx5_ib_err(ibdev, "Failed to unaffiliate port %u\n",
+			    port_num + 1);
+
+	ibdev->roce[port_num].last_port_state = IB_PORT_DOWN;
+}
+
+/* The mlx5_ib_multiport_mutex should be held when calling this function */
+static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev,
+				    struct mlx5_ib_multiport_info *mpi)
+{
+	u8 port_num = mlx5_core_native_port_num(mpi->mdev) - 1;
+	int err;
+
+	spin_lock(&ibdev->port[port_num].mp.mpi_lock);
+	if (ibdev->port[port_num].mp.mpi) {
+		mlx5_ib_warn(ibdev, "port %d already affiliated.\n",
+			     port_num + 1);
+		spin_unlock(&ibdev->port[port_num].mp.mpi_lock);
+		return false;
+	}
+
+	ibdev->port[port_num].mp.mpi = mpi;
+	mpi->ibdev = ibdev;
+	spin_unlock(&ibdev->port[port_num].mp.mpi_lock);
+
+	err = mlx5_nic_vport_affiliate_multiport(ibdev->mdev, mpi->mdev);
+	if (err)
+		goto unbind;
+
+	err = get_port_caps(ibdev, mlx5_core_native_port_num(mpi->mdev));
+	if (err)
+		goto unbind;
+
+	err = mlx5_add_netdev_notifier(ibdev, port_num);
+	if (err) {
+		mlx5_ib_err(ibdev, "failed adding netdev notifier for port %u\n",
+			    port_num + 1);
+		goto unbind;
+	}
+
+	return true;
+
+unbind:
+	mlx5_ib_unbind_slave_port(ibdev, mpi);
+	return false;
+}
+
+static int mlx5_ib_init_multiport_master(struct mlx5_ib_dev *dev)
+{
+	int port_num = mlx5_core_native_port_num(dev->mdev) - 1;
+	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&dev->ib_dev,
+							  port_num + 1);
+	struct mlx5_ib_multiport_info *mpi;
+	int err;
+	int i;
+
+	if (!mlx5_core_is_mp_master(dev->mdev) || ll != IB_LINK_LAYER_ETHERNET)
+		return 0;
+
+	err = mlx5_query_nic_vport_system_image_guid(dev->mdev,
+						     &dev->sys_image_guid);
+	if (err)
+		return err;
+
+	err = mlx5_nic_vport_enable_roce(dev->mdev);
+	if (err)
+		return err;
+
+	mutex_lock(&mlx5_ib_multiport_mutex);
+	for (i = 0; i < dev->num_ports; i++) {
+		bool bound = false;
+
+		/* build a stub multiport info struct for the native port. */
+		if (i == port_num) {
+			mpi = kzalloc(sizeof(*mpi), GFP_KERNEL);
+			if (!mpi) {
+				mutex_unlock(&mlx5_ib_multiport_mutex);
+				mlx5_nic_vport_disable_roce(dev->mdev);
+				return -ENOMEM;
+			}
+
+			mpi->is_master = true;
+			mpi->mdev = dev->mdev;
+			mpi->sys_image_guid = dev->sys_image_guid;
+			dev->port[i].mp.mpi = mpi;
+			mpi->ibdev = dev;
+			mpi = NULL;
+			continue;
+		}
+
+		list_for_each_entry(mpi, &mlx5_ib_unaffiliated_port_list,
+				    list) {
+			if (dev->sys_image_guid == mpi->sys_image_guid &&
+			    (mlx5_core_native_port_num(mpi->mdev) - 1) == i) {
+				bound = mlx5_ib_bind_slave_port(dev, mpi);
+			}
+
+			if (bound) {
+				dev_dbg(&mpi->mdev->pdev->dev, "removing port from unaffiliated list.\n");
+				mlx5_ib_dbg(dev, "port %d bound\n", i + 1);
+				list_del(&mpi->list);
+				break;
+			}
+		}
+		if (!bound) {
+			get_port_caps(dev, i + 1);
+			mlx5_ib_dbg(dev, "no free port found for port %d\n",
+				    i + 1);
+		}
+	}
+
+	list_add_tail(&dev->ib_dev_list, &mlx5_ib_dev_list);
+	mutex_unlock(&mlx5_ib_multiport_mutex);
+	return err;
+}
+
+static void mlx5_ib_cleanup_multiport_master(struct mlx5_ib_dev *dev)
+{
+	int port_num = mlx5_core_native_port_num(dev->mdev) - 1;
+	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&dev->ib_dev,
+							  port_num + 1);
+	int i;
+
+	if (!mlx5_core_is_mp_master(dev->mdev) || ll != IB_LINK_LAYER_ETHERNET)
+		return;
+
+	mutex_lock(&mlx5_ib_multiport_mutex);
+	for (i = 0; i < dev->num_ports; i++) {
+		if (dev->port[i].mp.mpi) {
+			/* Destroy the native port stub */
+			if (i == port_num) {
+				kfree(dev->port[i].mp.mpi);
+				dev->port[i].mp.mpi = NULL;
+			} else {
+				mlx5_ib_dbg(dev, "unbinding port_num: %d\n", i + 1);
+				mlx5_ib_unbind_slave_port(dev, dev->port[i].mp.mpi);
+			}
+		}
+	}
+
+	mlx5_ib_dbg(dev, "removing from devlist\n");
+	list_del(&dev->ib_dev_list);
+	mutex_unlock(&mlx5_ib_multiport_mutex);
+
+	mlx5_nic_vport_disable_roce(dev->mdev);
+}
+
 static void mlx5_ib_stage_init_cleanup(struct mlx5_ib_dev *dev)
 {
+	mlx5_ib_cleanup_multiport_master(dev);
 	kfree(dev->port);
 }
 
@@ -3967,16 +4246,36 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	const char *name;
 	int err;
+	int i;
 
 	dev->port = kcalloc(dev->num_ports, sizeof(*dev->port),
 			    GFP_KERNEL);
 	if (!dev->port)
 		return -ENOMEM;
 
-	err = get_port_caps(dev);
+	for (i = 0; i < dev->num_ports; i++) {
+		spin_lock_init(&dev->port[i].mp.mpi_lock);
+		rwlock_init(&dev->roce[i].netdev_lock);
+	}
+
+	err = mlx5_ib_init_multiport_master(dev);
 	if (err)
 		goto err_free_port;
 
+	if (!mlx5_core_mp_enabled(mdev)) {
+		int i;
+
+		for (i = 1; i <= dev->num_ports; i++) {
+			err = get_port_caps(dev, i);
+			if (err)
+				break;
+		}
+	} else {
+		err = get_port_caps(dev, mlx5_core_native_port_num(mdev));
+	}
+	if (err)
+		goto err_mp;
+
 	if (mlx5_use_mad_ifc(dev))
 		get_ext_port_caps(dev);
 
@@ -4000,6 +4299,8 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	spin_lock_init(&dev->reset_flow_resource_lock);
 
 	return 0;
+err_mp:
+	mlx5_ib_cleanup_multiport_master(dev);
 
 err_free_port:
 	kfree(dev->port);
@@ -4146,16 +4447,16 @@ static int mlx5_ib_stage_roce_init(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
-	u8 port_num = 0;
+	u8 port_num;
 	int err;
 	int i;
 
+	port_num = mlx5_core_native_port_num(dev->mdev) - 1;
 	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
 	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
 	if (ll == IB_LINK_LAYER_ETHERNET) {
 		for (i = 0; i < dev->num_ports; i++) {
-			rwlock_init(&dev->roce[i].netdev_lock);
 			dev->roce[i].dev = dev;
 			dev->roce[i].native_port_num = i + 1;
 			dev->roce[i].last_port_state = IB_PORT_DOWN;
@@ -4186,8 +4487,9 @@ static void mlx5_ib_stage_roce_cleanup(struct mlx5_ib_dev *dev)
 	struct mlx5_core_dev *mdev = dev->mdev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
-	u8 port_num = 0;
+	u8 port_num;
 
+	port_num = mlx5_core_native_port_num(dev->mdev) - 1;
 	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
 	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
 
@@ -4342,6 +4644,8 @@ static void __mlx5_ib_remove(struct mlx5_ib_dev *dev,
 	ib_dealloc_device((struct ib_device *)dev);
 }
 
+static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num);
+
 static void *__mlx5_ib_add(struct mlx5_core_dev *mdev,
 			   struct mlx5_ib_profile *profile)
 {
@@ -4356,7 +4660,8 @@ static void *__mlx5_ib_add(struct mlx5_core_dev *mdev,
 		return NULL;
 
 	dev->mdev = mdev;
-	dev->num_ports = MLX5_CAP_GEN(mdev, num_ports);
+	dev->num_ports = max(MLX5_CAP_GEN(mdev, num_ports),
+			     MLX5_CAP_GEN(mdev, num_vhca_ports));
 
 	for (i = 0; i < MLX5_IB_STAGE_MAX; i++) {
 		if (profile->stage[i].init) {
@@ -4419,15 +4724,81 @@ static struct mlx5_ib_profile pf_profile = {
 		     NULL),
 };
 
+static void *mlx5_ib_add_slave_port(struct mlx5_core_dev *mdev, u8 port_num)
+{
+	struct mlx5_ib_multiport_info *mpi;
+	struct mlx5_ib_dev *dev;
+	bool bound = false;
+	int err;
+
+	mpi = kzalloc(sizeof(*mpi), GFP_KERNEL);
+	if (!mpi)
+		return NULL;
+
+	mpi->mdev = mdev;
+
+	err = mlx5_query_nic_vport_system_image_guid(mdev,
+						     &mpi->sys_image_guid);
+	if (err) {
+		kfree(mpi);
+		return NULL;
+	}
+
+	mutex_lock(&mlx5_ib_multiport_mutex);
+	list_for_each_entry(dev, &mlx5_ib_dev_list, ib_dev_list) {
+		if (dev->sys_image_guid == mpi->sys_image_guid)
+			bound = mlx5_ib_bind_slave_port(dev, mpi);
+
+		if (bound) {
+			rdma_roce_rescan_device(&dev->ib_dev);
+			break;
+		}
+	}
+
+	if (!bound) {
+		list_add_tail(&mpi->list, &mlx5_ib_unaffiliated_port_list);
+		dev_dbg(&mdev->pdev->dev, "no suitable IB device found to bind to, added to unaffiliated list.\n");
+	} else {
+		mlx5_ib_dbg(dev, "bound port %u\n", port_num + 1);
+	}
+	mutex_unlock(&mlx5_ib_multiport_mutex);
+
+	return mpi;
+}
+
 static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 {
+	enum rdma_link_layer ll;
+	int port_type_cap;
+
+	port_type_cap = MLX5_CAP_GEN(mdev, port_type);
+	ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
+
+	if (mlx5_core_is_mp_slave(mdev) && ll == IB_LINK_LAYER_ETHERNET) {
+		u8 port_num = mlx5_core_native_port_num(mdev) - 1;
+
+		return mlx5_ib_add_slave_port(mdev, port_num);
+	}
+
 	return __mlx5_ib_add(mdev, &pf_profile);
 }
 
 static void mlx5_ib_remove(struct mlx5_core_dev *mdev, void *context)
 {
-	struct mlx5_ib_dev *dev = context;
+	struct mlx5_ib_multiport_info *mpi;
+	struct mlx5_ib_dev *dev;
+
+	if (mlx5_core_is_mp_slave(mdev)) {
+		mpi = context;
+		mutex_lock(&mlx5_ib_multiport_mutex);
+		if (mpi->ibdev)
+			mlx5_ib_unbind_slave_port(mpi->ibdev, mpi);
+		list_del(&mpi->list);
+		mutex_unlock(&mlx5_ib_multiport_mutex);
+		return;
+	}
 
+	dev = context;
 	__mlx5_ib_remove(dev, dev->profile, MLX5_IB_STAGE_MAX);
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 1107047c6f83..594f1a1d69c4 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -634,8 +634,17 @@ struct mlx5_ib_counters {
 	u16 set_id;
 };
 
+struct mlx5_ib_multiport_info;
+
+struct mlx5_ib_multiport {
+	struct mlx5_ib_multiport_info *mpi;
+	/* To be held when accessing the multiport info */
+	spinlock_t mpi_lock;
+};
+
 struct mlx5_ib_port {
 	struct mlx5_ib_counters cnts;
+	struct mlx5_ib_multiport mp;
 };
 
 struct mlx5_roce {
@@ -748,6 +757,17 @@ struct mlx5_ib_odp {
 	void			(*sync)(struct mlx5_ib_dev *dev);
 };
 
+struct mlx5_ib_multiport_info {
+	struct list_head list;
+	struct mlx5_ib_dev *ibdev;
+	struct mlx5_core_dev *mdev;
+	struct completion unref_comp;
+	u64 sys_image_guid;
+	u32 mdev_refcnt;
+	bool is_master;
+	bool unaffiliate;
+};
+
 struct mlx5_ib_dev {
 	struct ib_device		ib_dev;
 	struct mlx5_core_dev		*mdev;
@@ -785,6 +805,8 @@ struct mlx5_ib_dev {
 	struct mutex		lb_mutex;
 	u32			user_td;
 	u8			umr_fence;
+	struct list_head	ib_dev_list;
+	u64			sys_image_guid;
 };
 
 static inline struct mlx5_ib_cq *to_mibcq(struct mlx5_core_cq *mcq)
@@ -1054,6 +1076,13 @@ void mlx5_ib_gsi_pkey_change(struct mlx5_ib_gsi_qp *gsi);
 
 int mlx5_ib_generate_wc(struct ib_cq *ibcq, struct ib_wc *wc);
 
+struct mlx5_ib_dev *mlx5_ib_get_ibdev_from_mpi(struct mlx5_ib_multiport_info *mpi);
+struct mlx5_core_dev *mlx5_ib_get_native_port_mdev(struct mlx5_ib_dev *dev,
+						   u8 ib_port_num,
+						   u8 *native_port_num);
+void mlx5_ib_put_native_port_mdev(struct mlx5_ib_dev *dev,
+				  u8 port_num);
+
 static inline void init_query_mad(struct ib_smp *mad)
 {
 	mad->base_version  = 1;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index c4392f741c5f..c841b03c3e48 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -688,7 +688,7 @@ static inline int mlx5_fpga_conn_init_qp(struct mlx5_fpga_conn *conn)
 	MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
 	MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
 	MLX5_SET(qpc, qpc, primary_address_path.pkey_index, MLX5_FPGA_PKEY_INDEX);
-	MLX5_SET(qpc, qpc, primary_address_path.port, MLX5_FPGA_PORT_NUM);
+	MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, MLX5_FPGA_PORT_NUM);
 	MLX5_SET(qpc, qpc, pd, conn->fdev->conn_res.pdn);
 	MLX5_SET(qpc, qpc, cqn_snd, conn->cq.mcq.cqn);
 	MLX5_SET(qpc, qpc, cqn_rcv, conn->cq.mcq.cqn);
@@ -727,7 +727,7 @@ static inline int mlx5_fpga_conn_rtr_qp(struct mlx5_fpga_conn *conn)
 	MLX5_SET(qpc, qpc, next_rcv_psn,
 		 MLX5_GET(fpga_qpc, conn->fpga_qpc, next_send_psn));
 	MLX5_SET(qpc, qpc, primary_address_path.pkey_index, MLX5_FPGA_PKEY_INDEX);
-	MLX5_SET(qpc, qpc, primary_address_path.port, MLX5_FPGA_PORT_NUM);
+	MLX5_SET(qpc, qpc, primary_address_path.vhca_port_num, MLX5_FPGA_PORT_NUM);
 	ether_addr_copy(MLX5_ADDR_OF(qpc, qpc, primary_address_path.rmac_47_32),
 			MLX5_ADDR_OF(fpga_qpc, conn->fpga_qpc, fpga_mac_47_32));
 	MLX5_SET(qpc, qpc, primary_address_path.udp_sport,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index d2a66dc4adc6..261b95d014a0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -187,7 +187,7 @@ int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp *qp
 		 MLX5_QP_ENHANCED_ULP_STATELESS_MODE);
 
 	addr_path = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
-	MLX5_SET(ads, addr_path, port, 1);
+	MLX5_SET(ads, addr_path, vhca_port_num, 1);
 	MLX5_SET(ads, addr_path, grh, 1);
 
 	ret = mlx5_core_create_qp(mdev, qp, in, inlen);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index 916523103f16..9cb939b6a859 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -1121,3 +1121,61 @@ int mlx5_core_modify_hca_vport_context(struct mlx5_core_dev *dev,
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx5_core_modify_hca_vport_context);
+
+int mlx5_nic_vport_affiliate_multiport(struct mlx5_core_dev *master_mdev,
+				       struct mlx5_core_dev *port_mdev)
+{
+	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+	void *in;
+	int err;
+
+	in = kvzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	err = mlx5_nic_vport_enable_roce(port_mdev);
+	if (err)
+		goto free;
+
+	MLX5_SET(modify_nic_vport_context_in, in, field_select.affiliation, 1);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.affiliated_vhca_id,
+		 MLX5_CAP_GEN(master_mdev, vhca_id));
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.affiliation_criteria,
+		 MLX5_CAP_GEN(port_mdev, affiliate_nic_vport_criteria));
+
+	err = mlx5_modify_nic_vport_context(port_mdev, in, inlen);
+	if (err)
+		mlx5_nic_vport_disable_roce(port_mdev);
+
+free:
+	kvfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_nic_vport_affiliate_multiport);
+
+int mlx5_nic_vport_unaffiliate_multiport(struct mlx5_core_dev *port_mdev)
+{
+	int inlen = MLX5_ST_SZ_BYTES(modify_nic_vport_context_in);
+	void *in;
+	int err;
+
+	in = kvzalloc(inlen, GFP_KERNEL);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_nic_vport_context_in, in, field_select.affiliation, 1);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.affiliated_vhca_id, 0);
+	MLX5_SET(modify_nic_vport_context_in, in,
+		 nic_vport_context.affiliation_criteria, 0);
+
+	err = mlx5_modify_nic_vport_context(port_mdev, in, inlen);
+	if (!err)
+		mlx5_nic_vport_disable_roce(port_mdev);
+
+	kvfree(in);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx5_nic_vport_unaffiliate_multiport);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index cfcb91975323..c2c78bc42432 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1219,9 +1219,29 @@ static inline bool mlx5_rl_is_supported(struct mlx5_core_dev *dev)
 	return !!(dev->priv.rl_table.max_size);
 }
 
+static inline int mlx5_core_is_mp_slave(struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, affiliate_nic_vport_criteria) &&
+	       MLX5_CAP_GEN(dev, num_vhca_ports) <= 1;
+}
+
+static inline int mlx5_core_is_mp_master(struct mlx5_core_dev *dev)
+{
+	return MLX5_CAP_GEN(dev, num_vhca_ports) > 1;
+}
+
+static inline int mlx5_core_mp_enabled(struct mlx5_core_dev *dev)
+{
+	return mlx5_core_is_mp_slave(dev) ||
+	       mlx5_core_is_mp_master(dev);
+}
+
 static inline int mlx5_core_native_port_num(struct mlx5_core_dev *dev)
 {
-	return 1;
+	if (!mlx5_core_mp_enabled(dev))
+		return 1;
+
+	return MLX5_CAP_GEN(dev, native_port_num);
 }
 
 enum {
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index b1c81d7a86cb..7e88c8e7f374 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -502,7 +502,7 @@ struct mlx5_ifc_ads_bits {
 	u8         dei_cfi[0x1];
 	u8         eth_prio[0x3];
 	u8         sl[0x4];
-	u8         port[0x8];
+	u8         vhca_port_num[0x8];
 	u8         rmac_47_32[0x10];
 
 	u8         rmac_31_0[0x20];
@@ -794,7 +794,10 @@ enum {
 };
 
 struct mlx5_ifc_cmd_hca_cap_bits {
-	u8         reserved_at_0[0x80];
+	u8         reserved_at_0[0x30];
+	u8         vhca_id[0x10];
+
+	u8         reserved_at_40[0x40];
 
 	u8         log_max_srq_sz[0x8];
 	u8         log_max_qp_sz[0x8];
@@ -1066,8 +1069,11 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_5f8[0x3];
 	u8         log_max_xrq[0x5];
 
-	u8         reserved_at_600[0x1e];
-	u8	   sw_owner_id;
+	u8	   affiliate_nic_vport_criteria[0x8];
+	u8	   native_port_num[0x8];
+	u8	   num_vhca_ports[0x8];
+	u8	   reserved_at_618[0x6];
+	u8	   sw_owner_id[0x1];
 	u8	   reserved_at_61f[0x1e1];
 };
 
@@ -2617,7 +2623,12 @@ struct mlx5_ifc_nic_vport_context_bits {
 	u8         event_on_mc_address_change[0x1];
 	u8         event_on_uc_address_change[0x1];
 
-	u8         reserved_at_40[0xf0];
+	u8         reserved_at_40[0xc];
+
+	u8	   affiliation_criteria[0x4];
+	u8	   affiliated_vhca_id[0x10];
+
+	u8	   reserved_at_60[0xd0];
 
 	u8         mtu[0x10];
 
@@ -3260,7 +3271,8 @@ struct mlx5_ifc_set_roce_address_in_bits {
 	u8         op_mod[0x10];
 
 	u8         roce_address_index[0x10];
-	u8         reserved_at_50[0x10];
+	u8         reserved_at_50[0xc];
+	u8	   vhca_port_num[0x4];
 
 	u8         reserved_at_60[0x20];
 
@@ -3880,7 +3892,8 @@ struct mlx5_ifc_query_roce_address_in_bits {
 	u8         op_mod[0x10];
 
 	u8         roce_address_index[0x10];
-	u8         reserved_at_50[0x10];
+	u8         reserved_at_50[0xc];
+	u8	   vhca_port_num[0x4];
 
 	u8         reserved_at_60[0x20];
 };
@@ -5312,7 +5325,9 @@ struct mlx5_ifc_modify_nic_vport_context_out_bits {
 };
 
 struct mlx5_ifc_modify_nic_vport_field_select_bits {
-	u8         reserved_at_0[0x14];
+	u8         reserved_at_0[0x12];
+	u8	   affiliation[0x1];
+	u8	   reserved_at_e[0x1];
 	u8         disable_uc_local_lb[0x1];
 	u8         disable_mc_local_lb[0x1];
 	u8         node_guid[0x1];
diff --git a/include/linux/mlx5/vport.h b/include/linux/mlx5/vport.h
index aaa0bb9e7655..64e193e87394 100644
--- a/include/linux/mlx5/vport.h
+++ b/include/linux/mlx5/vport.h
@@ -116,4 +116,8 @@ int mlx5_core_modify_hca_vport_context(struct mlx5_core_dev *dev,
 				       struct mlx5_hca_vport_context *req);
 int mlx5_nic_vport_update_local_lb(struct mlx5_core_dev *mdev, bool enable);
 int mlx5_nic_vport_query_local_lb(struct mlx5_core_dev *mdev, bool *status);
+
+int mlx5_nic_vport_affiliate_multiport(struct mlx5_core_dev *master_mdev,
+				       struct mlx5_core_dev *port_mdev);
+int mlx5_nic_vport_unaffiliate_multiport(struct mlx5_core_dev *port_mdev);
 #endif /* __MLX5_VPORT_H__ */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 60c3268c8c04..c7c8032b1ecd 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3849,4 +3849,12 @@ ib_get_vector_affinity(struct ib_device *device, int comp_vector)
 
 }
 
+/**
+ * rdma_roce_rescan_device - Rescan all of the network devices in the system
+ * and add their gids, as needed, to the relevant RoCE devices.
+ *
+ * @device:         the rdma device
+ */
+void rdma_roce_rescan_device(struct ib_device *ibdev);
+
 #endif /* IB_VERBS_H */
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 07/15] IB/mlx5: Move IB event processing onto a workqueue
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 06/15] {net,IB}/mlx5: Manage port association for multiport RoCE Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 08/15] IB/mlx5: Implement dual port functionality in query routines Leon Romanovsky
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Because mlx5_ib_event can be called from atomic context move event
handling onto a workqueue. A mutex lock is required to get the IB device
for slave ports, so move event processing onto a work queue. When an IB
event is received, check if the mlx5_core_dev  is a slave port, if so
attempt to get the IB device it's affiliated with. If found process the
event for that device, otherwise return.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 69 +++++++++++++++++++++++++++++++--------
 1 file changed, 56 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 745c748a79a5..cf2f12f75019 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -72,10 +72,19 @@ static char mlx5_version[] =
 	DRIVER_NAME ": Mellanox Connect-IB Infiniband driver v"
 	DRIVER_VERSION "\n";
 
+struct mlx5_ib_event_work {
+	struct work_struct	work;
+	struct mlx5_core_dev	*dev;
+	void			*context;
+	enum mlx5_dev_event	event;
+	unsigned long		param;
+};
+
 enum {
 	MLX5_ATOMIC_SIZE_QP_8BYTES = 1 << 3,
 };
 
+static struct workqueue_struct *mlx5_ib_event_wq;
 static LIST_HEAD(mlx5_ib_unaffiliated_port_list);
 static LIST_HEAD(mlx5_ib_dev_list);
 /*
@@ -3012,15 +3021,24 @@ static void delay_drop_handler(struct work_struct *work)
 	mutex_unlock(&delay_drop->lock);
 }
 
-static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
-			  enum mlx5_dev_event event, unsigned long param)
+static void mlx5_ib_handle_event(struct work_struct *_work)
 {
-	struct mlx5_ib_dev *ibdev = (struct mlx5_ib_dev *)context;
+	struct mlx5_ib_event_work *work =
+		container_of(_work, struct mlx5_ib_event_work, work);
+	struct mlx5_ib_dev *ibdev;
 	struct ib_event ibev;
 	bool fatal = false;
 	u8 port = 0;
 
-	switch (event) {
+	if (mlx5_core_is_mp_slave(work->dev)) {
+		ibdev = mlx5_ib_get_ibdev_from_mpi(work->context);
+		if (!ibdev)
+			goto out;
+	} else {
+		ibdev = work->context;
+	}
+
+	switch (work->event) {
 	case MLX5_DEV_EVENT_SYS_ERROR:
 		ibev.event = IB_EVENT_DEVICE_FATAL;
 		mlx5_ib_handle_internal_error(ibdev);
@@ -3030,39 +3048,39 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
 	case MLX5_DEV_EVENT_PORT_UP:
 	case MLX5_DEV_EVENT_PORT_DOWN:
 	case MLX5_DEV_EVENT_PORT_INITIALIZED:
-		port = (u8)param;
+		port = (u8)work->param;
 
 		/* In RoCE, port up/down events are handled in
 		 * mlx5_netdev_event().
 		 */
 		if (mlx5_ib_port_link_layer(&ibdev->ib_dev, port) ==
 			IB_LINK_LAYER_ETHERNET)
-			return;
+			goto out;
 
-		ibev.event = (event == MLX5_DEV_EVENT_PORT_UP) ?
+		ibev.event = (work->event == MLX5_DEV_EVENT_PORT_UP) ?
 			     IB_EVENT_PORT_ACTIVE : IB_EVENT_PORT_ERR;
 		break;
 
 	case MLX5_DEV_EVENT_LID_CHANGE:
 		ibev.event = IB_EVENT_LID_CHANGE;
-		port = (u8)param;
+		port = (u8)work->param;
 		break;
 
 	case MLX5_DEV_EVENT_PKEY_CHANGE:
 		ibev.event = IB_EVENT_PKEY_CHANGE;
-		port = (u8)param;
+		port = (u8)work->param;
 
 		schedule_work(&ibdev->devr.ports[port - 1].pkey_change_work);
 		break;
 
 	case MLX5_DEV_EVENT_GUID_CHANGE:
 		ibev.event = IB_EVENT_GID_CHANGE;
-		port = (u8)param;
+		port = (u8)work->param;
 		break;
 
 	case MLX5_DEV_EVENT_CLIENT_REREG:
 		ibev.event = IB_EVENT_CLIENT_REREGISTER;
-		port = (u8)param;
+		port = (u8)work->param;
 		break;
 	case MLX5_DEV_EVENT_DELAY_DROP_TIMEOUT:
 		schedule_work(&ibdev->delay_drop.delay_drop_work);
@@ -3084,9 +3102,29 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
 
 	if (fatal)
 		ibdev->ib_active = false;
-
 out:
-	return;
+	kfree(work);
+}
+
+static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
+			  enum mlx5_dev_event event, unsigned long param)
+{
+	struct mlx5_ib_event_work *work;
+
+	work = kmalloc(sizeof(*work), GFP_ATOMIC);
+	if (work) {
+		INIT_WORK(&work->work, mlx5_ib_handle_event);
+		work->dev = dev;
+		work->param = param;
+		work->context = context;
+		work->event = event;
+
+		queue_work(mlx5_ib_event_wq, &work->work);
+		return;
+	}
+
+	dev_warn(&dev->pdev->dev, "%s: mlx5_dev_event: %d, with param: %lu dropped, couldn't allocate memory.\n",
+		 __func__, event, param);
 }
 
 static int set_has_smi_cap(struct mlx5_ib_dev *dev)
@@ -4816,6 +4854,10 @@ static int __init mlx5_ib_init(void)
 {
 	int err;
 
+	mlx5_ib_event_wq = alloc_ordered_workqueue("mlx5_ib_event_wq", 0);
+	if (!mlx5_ib_event_wq)
+		return -ENOMEM;
+
 	mlx5_ib_odp_init();
 
 	err = mlx5_register_interface(&mlx5_ib_interface);
@@ -4826,6 +4868,7 @@ static int __init mlx5_ib_init(void)
 static void __exit mlx5_ib_cleanup(void)
 {
 	mlx5_unregister_interface(&mlx5_ib_interface);
+	destroy_workqueue(mlx5_ib_event_wq);
 }
 
 module_init(mlx5_ib_init);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 08/15] IB/mlx5: Implement dual port functionality in query routines
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 07/15] IB/mlx5: Move IB event processing onto a workqueue Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 09/15] IB/mlx5: Change debugfs to have per port contents Leon Romanovsky
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Port operations must be routed to their native mlx5_core_dev. A
multiport RoCE device registers itself as having 2 ports even before a
2nd port is affiliated. If an unaffilated port is queried use capability
information from the master port, these values are the same.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 102 +++++++++++++++++++++++++++++++-------
 1 file changed, 85 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index cf2f12f75019..969981896ae6 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -361,16 +361,30 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 	struct mlx5_core_dev *mdev = dev->mdev;
 	struct net_device *ndev, *upper;
 	enum ib_mtu ndev_ib_mtu;
+	bool put_mdev = true;
 	u16 qkey_viol_cntr;
 	u32 eth_prot_oper;
+	u8 mdev_port_num;
 	int err;
 
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num, &mdev_port_num);
+	if (!mdev) {
+		/* This means the port isn't affiliated yet. Get the
+		 * info for the master port instead.
+		 */
+		put_mdev = false;
+		mdev = dev->mdev;
+		mdev_port_num = 1;
+		port_num = 1;
+	}
+
 	/* Possible bad flows are checked before filling out props so in case
 	 * of an error it will still be zeroed out.
 	 */
-	err = mlx5_query_port_eth_proto_oper(mdev, &eth_prot_oper, port_num);
+	err = mlx5_query_port_eth_proto_oper(mdev, &eth_prot_oper,
+					     mdev_port_num);
 	if (err)
-		return err;
+		goto out;
 
 	translate_eth_proto_oper(eth_prot_oper, &props->active_speed,
 				 &props->active_width);
@@ -386,12 +400,16 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 	props->state            = IB_PORT_DOWN;
 	props->phys_state       = 3;
 
-	mlx5_query_nic_vport_qkey_viol_cntr(dev->mdev, &qkey_viol_cntr);
+	mlx5_query_nic_vport_qkey_viol_cntr(mdev, &qkey_viol_cntr);
 	props->qkey_viol_cntr = qkey_viol_cntr;
 
+	/* If this is a stub query for an unaffiliated port stop here */
+	if (!put_mdev)
+		goto out;
+
 	ndev = mlx5_ib_get_netdev(device, port_num);
 	if (!ndev)
-		return 0;
+		goto out;
 
 	if (mlx5_lag_is_active(dev->mdev)) {
 		rcu_read_lock();
@@ -414,7 +432,10 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 	dev_put(ndev);
 
 	props->active_mtu	= min(props->max_mtu, ndev_ib_mtu);
-	return 0;
+out:
+	if (put_mdev)
+		mlx5_ib_put_native_port_mdev(dev, port_num);
+	return err;
 }
 
 static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
@@ -1200,7 +1221,22 @@ int mlx5_ib_query_port(struct ib_device *ibdev, u8 port,
 	}
 
 	if (!ret && props) {
-		count = mlx5_core_reserved_gids_count(to_mdev(ibdev)->mdev);
+		struct mlx5_ib_dev *dev = to_mdev(ibdev);
+		struct mlx5_core_dev *mdev;
+		bool put_mdev = true;
+
+		mdev = mlx5_ib_get_native_port_mdev(dev, port, NULL);
+		if (!mdev) {
+			/* If the port isn't affiliated yet query the master.
+			 * The master and slave will have the same values.
+			 */
+			mdev = dev->mdev;
+			port = 1;
+			put_mdev = false;
+		}
+		count = mlx5_core_reserved_gids_count(mdev);
+		if (put_mdev)
+			mlx5_ib_put_native_port_mdev(dev, port);
 		props->gid_tbl_len -= count;
 	}
 	return ret;
@@ -1225,20 +1261,43 @@ static int mlx5_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
 
 }
 
-static int mlx5_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
-			      u16 *pkey)
+static int mlx5_query_hca_nic_pkey(struct ib_device *ibdev, u8 port,
+				   u16 index, u16 *pkey)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
-	struct mlx5_core_dev *mdev = dev->mdev;
+	struct mlx5_core_dev *mdev;
+	bool put_mdev = true;
+	u8 mdev_port_num;
+	int err;
 
+	mdev = mlx5_ib_get_native_port_mdev(dev, port, &mdev_port_num);
+	if (!mdev) {
+		/* The port isn't affiliated yet, get the PKey from the master
+		 * port. For RoCE the PKey tables will be the same.
+		 */
+		put_mdev = false;
+		mdev = dev->mdev;
+		mdev_port_num = 1;
+	}
+
+	err = mlx5_query_hca_vport_pkey(mdev, 0, mdev_port_num, 0,
+					index, pkey);
+	if (put_mdev)
+		mlx5_ib_put_native_port_mdev(dev, port);
+
+	return err;
+}
+
+static int mlx5_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
+			      u16 *pkey)
+{
 	switch (mlx5_get_vport_access_method(ibdev)) {
 	case MLX5_VPORT_ACCESS_METHOD_MAD:
 		return mlx5_query_mad_ifc_pkey(ibdev, port, index, pkey);
 
 	case MLX5_VPORT_ACCESS_METHOD_HCA:
 	case MLX5_VPORT_ACCESS_METHOD_NIC:
-		return mlx5_query_hca_vport_pkey(mdev, 0, port,  0, index,
-						 pkey);
+		return mlx5_query_hca_nic_pkey(ibdev, port, index, pkey);
 	default:
 		return -EINVAL;
 	}
@@ -1277,23 +1336,32 @@ static int set_port_caps_atomic(struct mlx5_ib_dev *dev, u8 port_num, u32 mask,
 				u32 value)
 {
 	struct mlx5_hca_vport_context ctx = {};
+	struct mlx5_core_dev *mdev;
+	u8 mdev_port_num;
 	int err;
 
-	err = mlx5_query_hca_vport_context(dev->mdev, 0,
-					   port_num, 0, &ctx);
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num, &mdev_port_num);
+	if (!mdev)
+		return -ENODEV;
+
+	err = mlx5_query_hca_vport_context(mdev, 0, mdev_port_num, 0, &ctx);
 	if (err)
-		return err;
+		goto out;
 
 	if (~ctx.cap_mask1_perm & mask) {
 		mlx5_ib_warn(dev, "trying to change bitmask 0x%X but change supported 0x%X\n",
 			     mask, ctx.cap_mask1_perm);
-		return -EINVAL;
+		err = -EINVAL;
+		goto out;
 	}
 
 	ctx.cap_mask1 = value;
 	ctx.cap_mask1_perm = mask;
-	err = mlx5_core_modify_hca_vport_context(dev->mdev, 0,
-						 port_num, 0, &ctx);
+	err = mlx5_core_modify_hca_vport_context(mdev, 0, mdev_port_num,
+						 0, &ctx);
+
+out:
+	mlx5_ib_put_native_port_mdev(dev, port_num);
 
 	return err;
 }
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 09/15] IB/mlx5: Change debugfs to have per port contents
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 08/15] IB/mlx5: Implement dual port functionality in query routines Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 10/15] IB/mlx5: Update counter implementation for dual port RoCE Leon Romanovsky
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When there are multiple ports for single IB(RoCE) device, support
debugfs entries to be available for each port.

Signed-off-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/cong.c    | 83 +++++++++++++++++++++++++-----------
 drivers/infiniband/hw/mlx5/main.c    | 12 +++++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  7 +--
 3 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cong.c b/drivers/infiniband/hw/mlx5/cong.c
index 2d32b519bb61..985fa2637390 100644
--- a/drivers/infiniband/hw/mlx5/cong.c
+++ b/drivers/infiniband/hw/mlx5/cong.c
@@ -247,21 +247,30 @@ static void mlx5_ib_set_cc_param_mask_val(void *field, int offset,
 	}
 }
 
-static int mlx5_ib_get_cc_params(struct mlx5_ib_dev *dev, int offset, u32 *var)
+static int mlx5_ib_get_cc_params(struct mlx5_ib_dev *dev, u8 port_num,
+				 int offset, u32 *var)
 {
 	int outlen = MLX5_ST_SZ_BYTES(query_cong_params_out);
 	void *out;
 	void *field;
 	int err;
 	enum mlx5_ib_cong_node_type node;
+	struct mlx5_core_dev *mdev;
+
+	/* Takes a 1-based port number */
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num + 1, NULL);
+	if (!mdev)
+		return -ENODEV;
 
 	out = kvzalloc(outlen, GFP_KERNEL);
-	if (!out)
-		return -ENOMEM;
+	if (!out) {
+		err = -ENOMEM;
+		goto alloc_err;
+	}
 
 	node = mlx5_ib_param_to_node(offset);
 
-	err = mlx5_cmd_query_cong_params(dev->mdev, node, out, outlen);
+	err = mlx5_cmd_query_cong_params(mdev, node, out, outlen);
 	if (err)
 		goto free;
 
@@ -270,21 +279,32 @@ static int mlx5_ib_get_cc_params(struct mlx5_ib_dev *dev, int offset, u32 *var)
 
 free:
 	kvfree(out);
+alloc_err:
+	mlx5_ib_put_native_port_mdev(dev, port_num + 1);
 	return err;
 }
 
-static int mlx5_ib_set_cc_params(struct mlx5_ib_dev *dev, int offset, u32 var)
+static int mlx5_ib_set_cc_params(struct mlx5_ib_dev *dev, u8 port_num,
+				 int offset, u32 var)
 {
 	int inlen = MLX5_ST_SZ_BYTES(modify_cong_params_in);
 	void *in;
 	void *field;
 	enum mlx5_ib_cong_node_type node;
+	struct mlx5_core_dev *mdev;
 	u32 attr_mask = 0;
 	int err;
 
+	/* Takes a 1-based port number */
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num + 1, NULL);
+	if (!mdev)
+		return -ENODEV;
+
 	in = kvzalloc(inlen, GFP_KERNEL);
-	if (!in)
-		return -ENOMEM;
+	if (!in) {
+		err = -ENOMEM;
+		goto alloc_err;
+	}
 
 	MLX5_SET(modify_cong_params_in, in, opcode,
 		 MLX5_CMD_OP_MODIFY_CONG_PARAMS);
@@ -299,8 +319,10 @@ static int mlx5_ib_set_cc_params(struct mlx5_ib_dev *dev, int offset, u32 var)
 	MLX5_SET(field_select_r_roce_rp, field, field_select_r_roce_rp,
 		 attr_mask);
 
-	err = mlx5_cmd_modify_cong_params(dev->mdev, in, inlen);
+	err = mlx5_cmd_modify_cong_params(mdev, in, inlen);
 	kvfree(in);
+alloc_err:
+	mlx5_ib_put_native_port_mdev(dev, port_num + 1);
 	return err;
 }
 
@@ -324,7 +346,7 @@ static ssize_t set_param(struct file *filp, const char __user *buf,
 	if (kstrtou32(lbuf, 0, &var))
 		return -EINVAL;
 
-	ret = mlx5_ib_set_cc_params(param->dev, offset, var);
+	ret = mlx5_ib_set_cc_params(param->dev, param->port_num, offset, var);
 	return ret ? ret : count;
 }
 
@@ -340,7 +362,7 @@ static ssize_t get_param(struct file *filp, char __user *buf, size_t count,
 	if (*pos)
 		return 0;
 
-	ret = mlx5_ib_get_cc_params(param->dev, offset, &var);
+	ret = mlx5_ib_get_cc_params(param->dev, param->port_num, offset, &var);
 	if (ret)
 		return ret;
 
@@ -362,44 +384,51 @@ static const struct file_operations dbg_cc_fops = {
 	.read	= get_param,
 };
 
-void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev)
+void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	if (!mlx5_debugfs_root ||
-	    !dev->dbg_cc_params ||
-	    !dev->dbg_cc_params->root)
+	    !dev->port[port_num].dbg_cc_params ||
+	    !dev->port[port_num].dbg_cc_params->root)
 		return;
 
-	debugfs_remove_recursive(dev->dbg_cc_params->root);
-	kfree(dev->dbg_cc_params);
-	dev->dbg_cc_params = NULL;
+	debugfs_remove_recursive(dev->port[port_num].dbg_cc_params->root);
+	kfree(dev->port[port_num].dbg_cc_params);
+	dev->port[port_num].dbg_cc_params = NULL;
 }
 
-int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev)
+int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num)
 {
 	struct mlx5_ib_dbg_cc_params *dbg_cc_params;
+	struct mlx5_core_dev *mdev;
 	int i;
 
 	if (!mlx5_debugfs_root)
 		goto out;
 
-	if (!MLX5_CAP_GEN(dev->mdev, cc_query_allowed) ||
-	    !MLX5_CAP_GEN(dev->mdev, cc_modify_allowed))
+	/* Takes a 1-based port number */
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num + 1, NULL);
+	if (!mdev)
 		goto out;
 
+	if (!MLX5_CAP_GEN(mdev, cc_query_allowed) ||
+	    !MLX5_CAP_GEN(mdev, cc_modify_allowed))
+		goto put_mdev;
+
 	dbg_cc_params = kzalloc(sizeof(*dbg_cc_params), GFP_KERNEL);
 	if (!dbg_cc_params)
-		goto out;
+		goto err;
 
-	dev->dbg_cc_params = dbg_cc_params;
+	dev->port[port_num].dbg_cc_params = dbg_cc_params;
 
 	dbg_cc_params->root = debugfs_create_dir("cc_params",
-						 dev->mdev->priv.dbg_root);
+						 mdev->priv.dbg_root);
 	if (!dbg_cc_params->root)
 		goto err;
 
 	for (i = 0; i < MLX5_IB_DBG_CC_MAX; i++) {
 		dbg_cc_params->params[i].offset = i;
 		dbg_cc_params->params[i].dev = dev;
+		dbg_cc_params->params[i].port_num = port_num;
 		dbg_cc_params->params[i].dentry =
 			debugfs_create_file(mlx5_ib_dbg_cc_name[i],
 					    0600, dbg_cc_params->root,
@@ -408,11 +437,17 @@ int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev)
 		if (!dbg_cc_params->params[i].dentry)
 			goto err;
 	}
-out:	return 0;
+
+put_mdev:
+	mlx5_ib_put_native_port_mdev(dev, port_num + 1);
+out:
+	return 0;
 
 err:
 	mlx5_ib_warn(dev, "cong debugfs failure\n");
-	mlx5_ib_cleanup_cong_debugfs(dev);
+	mlx5_ib_cleanup_cong_debugfs(dev, port_num);
+	mlx5_ib_put_native_port_mdev(dev, port_num + 1);
+
 	/*
 	 * We don't want to fail driver if debugfs failed to initialize,
 	 * so we are not forwarding error to the user.
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 969981896ae6..d2fd7695fbe0 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4157,6 +4157,8 @@ static void mlx5_ib_unbind_slave_port(struct mlx5_ib_dev *ibdev,
 	int err;
 	int i;
 
+	mlx5_ib_cleanup_cong_debugfs(ibdev, port_num);
+
 	spin_lock(&port->mp.mpi_lock);
 	if (!mpi->ibdev) {
 		spin_unlock(&port->mp.mpi_lock);
@@ -4234,6 +4236,10 @@ static bool mlx5_ib_bind_slave_port(struct mlx5_ib_dev *ibdev,
 		goto unbind;
 	}
 
+	err = mlx5_ib_init_cong_debugfs(ibdev, port_num);
+	if (err)
+		goto unbind;
+
 	return true;
 
 unbind:
@@ -4647,12 +4653,14 @@ static void mlx5_ib_stage_counters_cleanup(struct mlx5_ib_dev *dev)
 
 static int mlx5_ib_stage_cong_debugfs_init(struct mlx5_ib_dev *dev)
 {
-	return mlx5_ib_init_cong_debugfs(dev);
+	return mlx5_ib_init_cong_debugfs(dev,
+					 mlx5_core_native_port_num(dev->mdev) - 1);
 }
 
 static void mlx5_ib_stage_cong_debugfs_cleanup(struct mlx5_ib_dev *dev)
 {
-	mlx5_ib_cleanup_cong_debugfs(dev);
+	mlx5_ib_cleanup_cong_debugfs(dev,
+				     mlx5_core_native_port_num(dev->mdev) - 1);
 }
 
 static int mlx5_ib_stage_uar_init(struct mlx5_ib_dev *dev)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 594f1a1d69c4..518bfba88b2b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -645,6 +645,7 @@ struct mlx5_ib_multiport {
 struct mlx5_ib_port {
 	struct mlx5_ib_counters cnts;
 	struct mlx5_ib_multiport mp;
+	struct mlx5_ib_dbg_cc_params	*dbg_cc_params;
 };
 
 struct mlx5_roce {
@@ -664,6 +665,7 @@ struct mlx5_ib_dbg_param {
 	int			offset;
 	struct mlx5_ib_dev	*dev;
 	struct dentry		*dentry;
+	u8			port_num;
 };
 
 enum mlx5_ib_dbg_cc_types {
@@ -798,7 +800,6 @@ struct mlx5_ib_dev {
 	struct mlx5_sq_bfreg	bfreg;
 	struct mlx5_sq_bfreg	fp_bfreg;
 	struct mlx5_ib_delay_drop	delay_drop;
-	struct mlx5_ib_dbg_cc_params	*dbg_cc_params;
 	struct mlx5_ib_profile	*profile;
 
 	/* protect the user_td */
@@ -1056,8 +1057,8 @@ __be16 mlx5_get_roce_udp_sport(struct mlx5_ib_dev *dev, u8 port_num,
 int mlx5_get_roce_gid_type(struct mlx5_ib_dev *dev, u8 port_num,
 			   int index, enum ib_gid_type *gid_type);
 
-void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev);
-int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev);
+void mlx5_ib_cleanup_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num);
+int mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u8 port_num);
 
 /* GSI QP helper functions */
 struct ib_qp *mlx5_ib_gsi_create_qp(struct ib_pd *pd,
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 10/15] IB/mlx5: Update counter implementation for dual port RoCE
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 09/15] IB/mlx5: Change debugfs to have per port contents Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 11/15] {net,IB}/mlx5: Change set_roce_gid to take a port number Leon Romanovsky
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Update the counter interface for multiple ports. Some counter sets
always comes from the primary device.

Port specific counters should be accessed per mlx5_core_dev not always
through the IB master mdev.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 75 +++++++++++++++++++++---------------
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 2 files changed, 45 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index d2fd7695fbe0..e61c0b2f29ba 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3765,11 +3765,12 @@ static const struct mlx5_ib_counter extended_err_cnts[] = {
 
 static void mlx5_ib_dealloc_counters(struct mlx5_ib_dev *dev)
 {
-	unsigned int i;
+	int i;
 
 	for (i = 0; i < dev->num_ports; i++) {
-		mlx5_core_dealloc_q_counter(dev->mdev,
-					    dev->port[i].cnts.set_id);
+		if (dev->port[i].cnts.set_id)
+			mlx5_core_dealloc_q_counter(dev->mdev,
+						    dev->port[i].cnts.set_id);
 		kfree(dev->port[i].cnts.names);
 		kfree(dev->port[i].cnts.offsets);
 	}
@@ -3811,6 +3812,7 @@ static int __mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev,
 
 err_names:
 	kfree(cnts->names);
+	cnts->names = NULL;
 	return -ENOMEM;
 }
 
@@ -3857,37 +3859,33 @@ static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev,
 
 static int mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev)
 {
+	int err = 0;
 	int i;
-	int ret;
 
 	for (i = 0; i < dev->num_ports; i++) {
-		struct mlx5_ib_port *port = &dev->port[i];
+		err = __mlx5_ib_alloc_counters(dev, &dev->port[i].cnts);
+		if (err)
+			goto err_alloc;
+
+		mlx5_ib_fill_counters(dev, dev->port[i].cnts.names,
+				      dev->port[i].cnts.offsets);
 
-		ret = mlx5_core_alloc_q_counter(dev->mdev,
-						&port->cnts.set_id);
-		if (ret) {
+		err = mlx5_core_alloc_q_counter(dev->mdev,
+						&dev->port[i].cnts.set_id);
+		if (err) {
 			mlx5_ib_warn(dev,
 				     "couldn't allocate queue counter for port %d, err %d\n",
-				     i + 1, ret);
-			goto dealloc_counters;
+				     i + 1, err);
+			goto err_alloc;
 		}
-
-		ret = __mlx5_ib_alloc_counters(dev, &port->cnts);
-		if (ret)
-			goto dealloc_counters;
-
-		mlx5_ib_fill_counters(dev, port->cnts.names,
-				      port->cnts.offsets);
+		dev->port[i].cnts.set_id_valid = true;
 	}
 
 	return 0;
 
-dealloc_counters:
-	while (--i >= 0)
-		mlx5_core_dealloc_q_counter(dev->mdev,
-					    dev->port[i].cnts.set_id);
-
-	return ret;
+err_alloc:
+	mlx5_ib_dealloc_counters(dev);
+	return err;
 }
 
 static struct rdma_hw_stats *mlx5_ib_alloc_hw_stats(struct ib_device *ibdev,
@@ -3906,7 +3904,7 @@ static struct rdma_hw_stats *mlx5_ib_alloc_hw_stats(struct ib_device *ibdev,
 					  RDMA_HW_STATS_DEFAULT_LIFESPAN);
 }
 
-static int mlx5_ib_query_q_counters(struct mlx5_ib_dev *dev,
+static int mlx5_ib_query_q_counters(struct mlx5_core_dev *mdev,
 				    struct mlx5_ib_port *port,
 				    struct rdma_hw_stats *stats)
 {
@@ -3919,7 +3917,7 @@ static int mlx5_ib_query_q_counters(struct mlx5_ib_dev *dev,
 	if (!out)
 		return -ENOMEM;
 
-	ret = mlx5_core_query_q_counter(dev->mdev,
+	ret = mlx5_core_query_q_counter(mdev,
 					port->cnts.set_id, 0,
 					out, outlen);
 	if (ret)
@@ -3935,7 +3933,7 @@ static int mlx5_ib_query_q_counters(struct mlx5_ib_dev *dev,
 	return ret;
 }
 
-static int mlx5_ib_query_cong_counters(struct mlx5_ib_dev *dev,
+static int mlx5_ib_query_cong_counters(struct mlx5_core_dev *mdev,
 				       struct mlx5_ib_port *port,
 				       struct rdma_hw_stats *stats)
 {
@@ -3948,7 +3946,7 @@ static int mlx5_ib_query_cong_counters(struct mlx5_ib_dev *dev,
 	if (!out)
 		return -ENOMEM;
 
-	ret = mlx5_cmd_query_cong_counter(dev->mdev, false, out, outlen);
+	ret = mlx5_cmd_query_cong_counter(mdev, false, out, outlen);
 	if (ret)
 		goto free;
 
@@ -3969,23 +3967,38 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev,
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	struct mlx5_ib_port *port = &dev->port[port_num - 1];
+	struct mlx5_core_dev *mdev;
 	int ret, num_counters;
+	u8 mdev_port_num;
 
 	if (!stats)
 		return -EINVAL;
 
-	ret = mlx5_ib_query_q_counters(dev, port, stats);
+	num_counters = port->cnts.num_q_counters + port->cnts.num_cong_counters;
+
+	/* q_counters are per IB device, query the master mdev */
+	ret = mlx5_ib_query_q_counters(dev->mdev, port, stats);
 	if (ret)
 		return ret;
-	num_counters = port->cnts.num_q_counters;
 
 	if (MLX5_CAP_GEN(dev->mdev, cc_query_allowed)) {
-		ret = mlx5_ib_query_cong_counters(dev, port, stats);
+		mdev = mlx5_ib_get_native_port_mdev(dev, port_num,
+						    &mdev_port_num);
+		if (!mdev) {
+			/* If port is not affiliated yet, its in down state
+			 * which doesn't have any counters yet, so it would be
+			 * zero. So no need to read from the HCA.
+			 */
+			goto done;
+		}
+
+		ret = mlx5_ib_query_cong_counters(mdev, port, stats);
+		mlx5_ib_put_native_port_mdev(dev, port_num);
 		if (ret)
 			return ret;
-		num_counters += port->cnts.num_cong_counters;
 	}
 
+done:
 	return num_counters;
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 518bfba88b2b..196b55b90c60 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -632,6 +632,7 @@ struct mlx5_ib_counters {
 	u32 num_q_counters;
 	u32 num_cong_counters;
 	u16 set_id;
+	bool set_id_valid;
 };
 
 struct mlx5_ib_multiport_info;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 11/15] {net,IB}/mlx5: Change set_roce_gid to take a port number
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 10/15] IB/mlx5: Update counter implementation for dual port RoCE Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 12/15] IB/mlx5: Route MADs for dual port RoCE Leon Romanovsky
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c                   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 7 ++++---
 drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c   | 5 ++++-
 include/linux/mlx5/driver.h                         | 2 +-
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index e61c0b2f29ba..261c2a8db6b3 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -477,7 +477,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 port_num,
 
 	return mlx5_core_roce_gid_set(dev->mdev, index, roce_version,
 				      roce_l3_type, gid->raw, mac, vlan,
-				      vlan_id);
+				      vlan_id, port_num);
 }
 
 static int mlx5_ib_add_gid(struct ib_device *device, u8 port_num,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index c841b03c3e48..e6175f8ac0e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -888,7 +888,8 @@ struct mlx5_fpga_conn *mlx5_fpga_conn_create(struct mlx5_fpga_device *fdev,
 	err = mlx5_core_roce_gid_set(fdev->mdev, conn->qp.sgid_index,
 				     MLX5_ROCE_VERSION_2,
 				     MLX5_ROCE_L3_TYPE_IPV6,
-				     remote_ip, remote_mac, true, 0);
+				     remote_ip, remote_mac, true, 0,
+				     MLX5_FPGA_PORT_NUM);
 	if (err) {
 		mlx5_fpga_err(fdev, "Failed to set SGID: %d\n", err);
 		ret = ERR_PTR(err);
@@ -954,7 +955,7 @@ struct mlx5_fpga_conn *mlx5_fpga_conn_create(struct mlx5_fpga_device *fdev,
 	mlx5_fpga_conn_destroy_cq(conn);
 err_gid:
 	mlx5_core_roce_gid_set(fdev->mdev, conn->qp.sgid_index, 0, 0, NULL,
-			       NULL, false, 0);
+			       NULL, false, 0, MLX5_FPGA_PORT_NUM);
 err_rsvd_gid:
 	mlx5_core_reserved_gid_free(fdev->mdev, conn->qp.sgid_index);
 err:
@@ -982,7 +983,7 @@ void mlx5_fpga_conn_destroy(struct mlx5_fpga_conn *conn)
 	mlx5_fpga_conn_destroy_cq(conn);
 
 	mlx5_core_roce_gid_set(conn->fdev->mdev, conn->qp.sgid_index, 0, 0,
-			       NULL, NULL, false, 0);
+			       NULL, NULL, false, 0, MLX5_FPGA_PORT_NUM);
 	mlx5_core_reserved_gid_free(conn->fdev->mdev, conn->qp.sgid_index);
 	kfree(conn);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
index 573f59f46d41..7722a3f9bb68 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
@@ -121,7 +121,7 @@ EXPORT_SYMBOL_GPL(mlx5_core_reserved_gids_count);
 
 int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index,
 			   u8 roce_version, u8 roce_l3_type, const u8 *gid,
-			   const u8 *mac, bool vlan, u16 vlan_id)
+			   const u8 *mac, bool vlan, u16 vlan_id, u8 port_num)
 {
 #define MLX5_SET_RA(p, f, v) MLX5_SET(roce_addr_layout, p, f, v)
 	u32  in[MLX5_ST_SZ_DW(set_roce_address_in)] = {0};
@@ -148,6 +148,9 @@ int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index,
 		memcpy(addr_l3_addr, gid, gidsz);
 	}
 
+	if (MLX5_CAP_GEN(dev, num_vhca_ports) > 0)
+		MLX5_SET(set_roce_address_in, in, vhca_port_num, port_num);
+
 	MLX5_SET(set_roce_address_in, in, roce_address_index, index);
 	MLX5_SET(set_roce_address_in, in, opcode, MLX5_CMD_OP_SET_ROCE_ADDRESS);
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index c2c78bc42432..bc494ddefcf8 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1101,7 +1101,7 @@ void mlx5_free_bfreg(struct mlx5_core_dev *mdev, struct mlx5_sq_bfreg *bfreg);
 unsigned int mlx5_core_reserved_gids_count(struct mlx5_core_dev *dev);
 int mlx5_core_roce_gid_set(struct mlx5_core_dev *dev, unsigned int index,
 			   u8 roce_version, u8 roce_l3_type, const u8 *gid,
-			   const u8 *mac, bool vlan, u16 vlan_id);
+			   const u8 *mac, bool vlan, u16 vlan_id, u8 port_num);
 
 static inline int fw_initializing(struct mlx5_core_dev *dev)
 {
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 12/15] IB/mlx5: Route MADs for dual port RoCE
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (10 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 11/15] {net,IB}/mlx5: Change set_roce_gid to take a port number Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 12:57   ` [PATCH rdma-next 13/15] IB/mlx5: Use correct mdev for vport queries in ib_virt Leon Romanovsky
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Route performance query MADs to the correct mlx5_core_dev when using
dual port RoCE mode.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mad.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 0559e0a9e398..32a9e9228b13 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -197,10 +197,9 @@ static void pma_cnt_assign(struct ib_pma_portcounters *pma_cnt,
 			     vl_15_dropped);
 }
 
-static int process_pma_cmd(struct ib_device *ibdev, u8 port_num,
+static int process_pma_cmd(struct mlx5_core_dev *mdev, u8 port_num,
 			   const struct ib_mad *in_mad, struct ib_mad *out_mad)
 {
-	struct mlx5_ib_dev *dev = to_mdev(ibdev);
 	int err;
 	void *out_cnt;
 
@@ -222,7 +221,7 @@ static int process_pma_cmd(struct ib_device *ibdev, u8 port_num,
 		if (!out_cnt)
 			return IB_MAD_RESULT_FAILURE;
 
-		err = mlx5_core_query_vport_counter(dev->mdev, 0, 0,
+		err = mlx5_core_query_vport_counter(mdev, 0, 0,
 						    port_num, out_cnt, sz);
 		if (!err)
 			pma_cnt_ext_assign(pma_cnt_ext, out_cnt);
@@ -235,7 +234,7 @@ static int process_pma_cmd(struct ib_device *ibdev, u8 port_num,
 		if (!out_cnt)
 			return IB_MAD_RESULT_FAILURE;
 
-		err = mlx5_core_query_ib_ppcnt(dev->mdev, port_num,
+		err = mlx5_core_query_ib_ppcnt(mdev, port_num,
 					       out_cnt, sz);
 		if (!err)
 			pma_cnt_assign(pma_cnt, out_cnt);
@@ -255,9 +254,11 @@ int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			u16 *out_mad_pkey_index)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ibdev);
-	struct mlx5_core_dev *mdev = dev->mdev;
 	const struct ib_mad *in_mad = (const struct ib_mad *)in;
 	struct ib_mad *out_mad = (struct ib_mad *)out;
+	struct mlx5_core_dev *mdev;
+	u8 mdev_port_num;
+	int ret;
 
 	if (WARN_ON_ONCE(in_mad_size != sizeof(*in_mad) ||
 			 *out_mad_size != sizeof(*out_mad)))
@@ -265,14 +266,20 @@ int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 
 	memset(out_mad->data, 0, sizeof(out_mad->data));
 
+	mdev = mlx5_ib_get_native_port_mdev(dev, port_num, &mdev_port_num);
+	if (!mdev)
+		return IB_MAD_RESULT_FAILURE;
+
 	if (MLX5_CAP_GEN(mdev, vport_counters) &&
 	    in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_PERF_MGMT &&
 	    in_mad->mad_hdr.method == IB_MGMT_METHOD_GET) {
-		return process_pma_cmd(ibdev, port_num, in_mad, out_mad);
+		ret = process_pma_cmd(mdev, mdev_port_num, in_mad, out_mad);
 	} else {
-		return process_mad(ibdev, mad_flags, port_num, in_wc, in_grh,
+		ret =  process_mad(ibdev, mad_flags, port_num, in_wc, in_grh,
 				   in_mad, out_mad);
 	}
+	mlx5_ib_put_native_port_mdev(dev, port_num);
+	return ret;
 }
 
 int mlx5_query_ext_port_caps(struct mlx5_ib_dev *dev, u8 port)
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 13/15] IB/mlx5: Use correct mdev for vport queries in ib_virt
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (11 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 12/15] IB/mlx5: Route MADs for dual port RoCE Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
       [not found]     ` <20171224125741.25464-14-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-12-24 12:57   ` [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode Leon Romanovsky
                     ` (2 subsequent siblings)
  15 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When using dual port RoCE mode vport queries should be routed to the
respective port mlx5_core_dev instead of the IB devices master mdev.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/ib_virt.c | 84 +++++++++++++++++++++++++++++-------
 1 file changed, 69 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/ib_virt.c b/drivers/infiniband/hw/mlx5/ib_virt.c
index 649a3364f838..fcdc85e89ba8 100644
--- a/drivers/infiniband/hw/mlx5/ib_virt.c
+++ b/drivers/infiniband/hw/mlx5/ib_virt.c
@@ -52,26 +52,36 @@ int mlx5_ib_get_vf_config(struct ib_device *device, int vf, u8 port,
 			  struct ifla_vf_info *info)
 {
 	struct mlx5_ib_dev *dev = to_mdev(device);
-	struct mlx5_core_dev *mdev = dev->mdev;
 	struct mlx5_hca_vport_context *rep;
+	struct mlx5_core_dev *mdev;
+	u8 mdev_port_num;
 	int err;
 
 	rep = kzalloc(sizeof(*rep), GFP_KERNEL);
 	if (!rep)
 		return -ENOMEM;
 
-	err = mlx5_query_hca_vport_context(mdev, 1, 1,  vf + 1, rep);
+	mdev = mlx5_ib_get_native_port_mdev(dev, port, &mdev_port_num);
+	if (!mdev) {
+		err = -ENODEV;
+		goto out;
+	}
+
+	err = mlx5_query_hca_vport_context(mdev, 1, mdev_port_num, vf + 1,
+					   rep);
 	if (err) {
 		mlx5_ib_warn(dev, "failed to query port policy for vf %d (%d)\n",
 			     vf, err);
-		goto free;
+		goto put_mdev;
 	}
 	memset(info, 0, sizeof(*info));
 	info->linkstate = mlx_to_net_policy(rep->policy);
 	if (info->linkstate == __IFLA_VF_LINK_STATE_MAX)
 		err = -EINVAL;
 
-free:
+put_mdev:
+	mlx5_ib_put_native_port_mdev(dev, port);
+out:
 	kfree(rep);
 	return err;
 }
@@ -94,9 +104,10 @@ int mlx5_ib_set_vf_link_state(struct ib_device *device, int vf,
 			      u8 port, int state)
 {
 	struct mlx5_ib_dev *dev = to_mdev(device);
-	struct mlx5_core_dev *mdev = dev->mdev;
 	struct mlx5_hca_vport_context *in;
-	struct mlx5_vf_context *vfs_ctx = mdev->priv.sriov.vfs_ctx;
+	struct mlx5_vf_context *vfs_ctx;
+	struct mlx5_core_dev *mdev;
+	u8 mdev_port_num;
 	int err;
 
 	in = kzalloc(sizeof(*in), GFP_KERNEL);
@@ -108,11 +119,21 @@ int mlx5_ib_set_vf_link_state(struct ib_device *device, int vf,
 		err = -EINVAL;
 		goto out;
 	}
+
+	mdev = mlx5_ib_get_native_port_mdev(dev, port, &mdev_port_num);
+	if (!mdev) {
+		err = -ENODEV;
+		goto out;
+	}
+
+	vfs_ctx = mdev->priv.sriov.vfs_ctx;
 	in->field_select = MLX5_HCA_VPORT_SEL_STATE_POLICY;
-	err = mlx5_core_modify_hca_vport_context(mdev, 1, 1, vf + 1, in);
+	err = mlx5_core_modify_hca_vport_context(mdev, 1, mdev_port_num,
+						 vf + 1, in);
 	if (!err)
 		vfs_ctx[vf].policy = in->policy;
 
+	mlx5_ib_put_native_port_mdev(dev, port);
 out:
 	kfree(in);
 	return err;
@@ -124,20 +145,29 @@ int mlx5_ib_get_vf_stats(struct ib_device *device, int vf,
 	int out_sz = MLX5_ST_SZ_BYTES(query_vport_counter_out);
 	struct mlx5_core_dev *mdev;
 	struct mlx5_ib_dev *dev;
+	u8 mdev_port_num;
 	void *out;
 	int err;
 
 	dev = to_mdev(device);
-	mdev = dev->mdev;
 
 	out = kzalloc(out_sz, GFP_KERNEL);
 	if (!out)
 		return -ENOMEM;
 
-	err = mlx5_core_query_vport_counter(mdev, true, vf, port, out, out_sz);
+	mdev = mlx5_ib_get_native_port_mdev(dev, port, &mdev_port_num);
+	if (!mdev) {
+		err = -ENODEV;
+		goto ex;
+	}
+
+	err = mlx5_core_query_vport_counter(mdev, true, vf, mdev_port_num, out,
+					    out_sz);
 	if (err)
 		goto ex;
 
+	mlx5_ib_put_native_port_mdev(dev, port);
+
 	stats->rx_packets = MLX5_GET64_PR(query_vport_counter_out, out, received_ib_unicast.packets);
 	stats->tx_packets = MLX5_GET64_PR(query_vport_counter_out, out, transmitted_ib_unicast.packets);
 	stats->rx_bytes = MLX5_GET64_PR(query_vport_counter_out, out, received_ib_unicast.octets);
@@ -152,20 +182,32 @@ int mlx5_ib_get_vf_stats(struct ib_device *device, int vf,
 static int set_vf_node_guid(struct ib_device *device, int vf, u8 port, u64 guid)
 {
 	struct mlx5_ib_dev *dev = to_mdev(device);
-	struct mlx5_core_dev *mdev = dev->mdev;
 	struct mlx5_hca_vport_context *in;
-	struct mlx5_vf_context *vfs_ctx = mdev->priv.sriov.vfs_ctx;
+	struct mlx5_vf_context *vfs_ctx;
+	struct mlx5_core_dev *mdev;
+	u8 mdev_port_num;
 	int err;
 
 	in = kzalloc(sizeof(*in), GFP_KERNEL);
 	if (!in)
 		return -ENOMEM;
 
+	mdev = mlx5_ib_get_native_port_mdev(dev, port, &mdev_port_num);
+	if (!mdev) {
+		err = -ENODEV;
+		goto out;
+	}
+
+	vfs_ctx = mdev->priv.sriov.vfs_ctx;
 	in->field_select = MLX5_HCA_VPORT_SEL_NODE_GUID;
 	in->node_guid = guid;
-	err = mlx5_core_modify_hca_vport_context(mdev, 1, 1, vf + 1, in);
+	err = mlx5_core_modify_hca_vport_context(mdev, 1, mdev_port_num,
+						 vf + 1, in);
 	if (!err)
 		vfs_ctx[vf].node_guid = guid;
+
+	mlx5_ib_put_native_port_mdev(dev, port);
+out:
 	kfree(in);
 	return err;
 }
@@ -173,20 +215,32 @@ static int set_vf_node_guid(struct ib_device *device, int vf, u8 port, u64 guid)
 static int set_vf_port_guid(struct ib_device *device, int vf, u8 port, u64 guid)
 {
 	struct mlx5_ib_dev *dev = to_mdev(device);
-	struct mlx5_core_dev *mdev = dev->mdev;
 	struct mlx5_hca_vport_context *in;
-	struct mlx5_vf_context *vfs_ctx = mdev->priv.sriov.vfs_ctx;
+	struct mlx5_vf_context *vfs_ctx;
+	struct mlx5_core_dev *mdev;
+	u8 mdev_port_num;
 	int err;
 
 	in = kzalloc(sizeof(*in), GFP_KERNEL);
 	if (!in)
 		return -ENOMEM;
 
+	mdev = mlx5_ib_get_native_port_mdev(dev, port, &mdev_port_num);
+	if (!mdev) {
+		err = -ENODEV;
+		goto out;
+	}
+
+	vfs_ctx = mdev->priv.sriov.vfs_ctx;
 	in->field_select = MLX5_HCA_VPORT_SEL_PORT_GUID;
 	in->port_guid = guid;
-	err = mlx5_core_modify_hca_vport_context(mdev, 1, 1, vf + 1, in);
+	err = mlx5_core_modify_hca_vport_context(mdev, 1, mdev_port_num,
+						 vf + 1, in);
 	if (!err)
 		vfs_ctx[vf].port_guid = guid;
+
+	mlx5_ib_put_native_port_mdev(dev, port);
+out:
 	kfree(in);
 	return err;
 }
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (12 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 13/15] IB/mlx5: Use correct mdev for vport queries in ib_virt Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
       [not found]     ` <20171224125741.25464-15-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2017-12-24 12:57   ` [PATCH rdma-next 15/15] net/mlx5: Set num_vhca_ports capability Leon Romanovsky
  2017-12-24 21:48   ` [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE Or Gerlitz
  15 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When operating in dual port RoCE mode RAW QPs are not supposed
to work on the slave port.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 261c2a8db6b3..9e2d6a872e5a 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -710,6 +710,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	int max_rq_sg;
 	int max_sq_sg;
 	u64 min_page_size = 1ull << MLX5_CAP_GEN(mdev, log_pg_sz);
+	bool raw_support = !mlx5_core_mp_enabled(mdev);
 	struct mlx5_ib_query_device_resp resp = {};
 	size_t resp_len;
 	u64 max_tso;
@@ -773,7 +774,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	if (MLX5_CAP_GEN(mdev, block_lb_mc))
 		props->device_cap_flags |= IB_DEVICE_BLOCK_MULTICAST_LOOPBACK;
 
-	if (MLX5_CAP_GEN(dev->mdev, eth_net_offloads)) {
+	if (MLX5_CAP_GEN(dev->mdev, eth_net_offloads) && raw_support) {
 		if (MLX5_CAP_ETH(mdev, csum_cap)) {
 			/* Legacy bit to support old userspace libraries */
 			props->device_cap_flags |= IB_DEVICE_RAW_IP_CSUM;
@@ -822,7 +823,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	}
 
 	if (MLX5_CAP_GEN(dev->mdev, rq_delay_drop) &&
-	    MLX5_CAP_GEN(dev->mdev, general_notification_event))
+	    MLX5_CAP_GEN(dev->mdev, general_notification_event) &&
+	    raw_support)
 		props->raw_packet_caps |= IB_RAW_PACKET_CAP_DELAY_DROP;
 
 	if (MLX5_CAP_GEN(mdev, ipoib_enhanced_offloads) &&
@@ -830,7 +832,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM;
 
 	if (MLX5_CAP_GEN(dev->mdev, eth_net_offloads) &&
-	    MLX5_CAP_ETH(dev->mdev, scatter_fcs)) {
+	    MLX5_CAP_ETH(dev->mdev, scatter_fcs) &&
+	    raw_support) {
 		/* Legacy bit to support old userspace libraries */
 		props->device_cap_flags |= IB_DEVICE_RAW_SCATTER_FCS;
 		props->raw_packet_caps |= IB_RAW_PACKET_CAP_SCATTER_FCS;
@@ -894,7 +897,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		props->device_cap_flags |= IB_DEVICE_VIRTUAL_FUNCTION;
 
 	if (mlx5_ib_port_link_layer(ibdev, 1) ==
-	    IB_LINK_LAYER_ETHERNET) {
+	    IB_LINK_LAYER_ETHERNET && raw_support) {
 		props->rss_caps.max_rwq_indirection_tables =
 			1 << MLX5_CAP_GEN(dev->mdev, log_max_rqt);
 		props->rss_caps.max_rwq_indirection_table_size =
@@ -931,7 +934,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		resp.response_length += sizeof(resp.cqe_comp_caps);
 	}
 
-	if (field_avail(typeof(resp), packet_pacing_caps, uhw->outlen)) {
+	if (field_avail(typeof(resp), packet_pacing_caps, uhw->outlen) &&
+	    raw_support) {
 		if (MLX5_CAP_QOS(mdev, packet_pacing) &&
 		    MLX5_CAP_GEN(mdev, qos)) {
 			resp.packet_pacing_caps.qp_rate_limit_max =
@@ -990,7 +994,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		}
 	}
 
-	if (field_avail(typeof(resp), striding_rq_caps, uhw->outlen)) {
+	if (field_avail(typeof(resp), striding_rq_caps, uhw->outlen) &&
+	    raw_support) {
 		resp.response_length += sizeof(resp.striding_rq_caps);
 		if (MLX5_CAP_GEN(mdev, striding_rq)) {
 			resp.striding_rq_caps.min_single_stride_log_num_of_bytes =
@@ -3561,12 +3566,14 @@ static u32 get_core_cap_flags(struct ib_device *ibdev)
 	enum rdma_link_layer ll = mlx5_ib_port_link_layer(ibdev, 1);
 	u8 l3_type_cap = MLX5_CAP_ROCE(dev->mdev, l3_type);
 	u8 roce_version_cap = MLX5_CAP_ROCE(dev->mdev, roce_version);
+	bool raw_support = !mlx5_core_mp_enabled(dev->mdev);
 	u32 ret = 0;
 
 	if (ll == IB_LINK_LAYER_INFINIBAND)
 		return RDMA_CORE_PORT_IBA_IB;
 
-	ret = RDMA_CORE_PORT_RAW_PACKET;
+	if (raw_support)
+		ret = RDMA_CORE_PORT_RAW_PACKET;
 
 	if (!(l3_type_cap & MLX5_ROCE_L3_TYPE_IPV4_CAP))
 		return ret;
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH rdma-next 15/15] net/mlx5: Set num_vhca_ports capability
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (13 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode Leon Romanovsky
@ 2017-12-24 12:57   ` Leon Romanovsky
  2017-12-24 21:48   ` [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE Or Gerlitz
  15 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-24 12:57 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens, Parav Pandit

From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Set the current capability to the max capability. Doing so enables dual
port RoCE functionality if supported by the firmware.

Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 5f3dc0ede917..77c9f7e42a99 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -554,6 +554,12 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
 			 cache_line_128byte,
 			 cache_line_size() == 128 ? 1 : 0);
 
+	if (MLX5_CAP_GEN_MAX(dev, num_vhca_ports))
+		MLX5_SET(cmd_hca_cap,
+			 set_hca_cap,
+			 num_vhca_ports,
+			 MLX5_CAP_GEN_MAX(dev, num_vhca_ports));
+
 	err = set_caps(dev, set_ctx, set_sz,
 		       MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
 
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (14 preceding siblings ...)
  2017-12-24 12:57   ` [PATCH rdma-next 15/15] net/mlx5: Set num_vhca_ports capability Leon Romanovsky
@ 2017-12-24 21:48   ` Or Gerlitz
  15 siblings, 0 replies; 47+ messages in thread
From: Or Gerlitz @ 2017-12-24 21:48 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Leon Romanovsky, Saeed Mahameed

On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From Daniel:
>
> This feature allows RDMA resources (pd, mr, cq, qp, etc) to be used with
> both physical ports of capable mlx5 devices. When enabled a single IB
> device with two ports will be registered instead of two single port
> devices.
>
> There are still two PCI devices underlying the two port device, the
> capabilities indicate which device is the "master" device and which is
> the slave.
>
> When the add callback function is called for a slave device a list of IB
> devices is searched for matching master device, indicated by the capabilities
> and the system_image_guid. If a match is found the slave is bound to the
> master device, otherwise it's placed on a list, in case it's master becomes
> available in the future. When a master device is added it searches the list
> of available slaves for a matching slave device. If a match is found it binds
> the slave as its 2nd port. If no match as found the device still appears
> as a dual port device, with the 2nd port down. RDMA resources can still
> created that use the yet unavailable 2nd port.
>
> Commands related to IB resources are all routed through the master mlx5_core
> device. Port specific commands, like those for hardware counters are routed to
> their respective port mlx5_core device. Since devices can appear and disappear
> asynchronously a reference count on the underlying mlx5_core device is
> maintained. Getting and putting this reference is only necessary for commands
> destined to a specific port, the master core device can be used freely,
> as it will exist while the IB device exists.

Daniel, so far you only talked on IB devices and a master/slave
relation you introduce for the
core devices


> SR-IOV devices follow the same pattern as the physical ones. VFs of a master
> port can bind VFs of slave ports, if available, and operate as dual port devices.

Not following.

We still have two mlx5 EN devices, right?! and from libvirt point of
view business are just as usual,
a VM will be assigned with one VF from any of the core devices (master
only, slave only or both)
and this will be an IB device with one port - correct?

Note that most of us are ooo till early January (== early next week),
so should be able to follow on your proposal then.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 01/15] net/mlx5: Fix race for multiple RoCE enable
       [not found]     ` <20171224125741.25464-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-12-24 21:53       ` Or Gerlitz
       [not found]         ` <CAJ3xEMg8U_4DpYEWa8t1QpccNzjU6f0shEtQ7fRWg-EtGupy+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2017-12-24 21:53 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit

On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> There are two potential problems with the existing implementation.
>
> 1. Enable and disable can race after the atomic operations,

s/,/./

> 2. If a command fails the refcount is left in an inconsistent state.
>
> Introduce a lock and perform error checking.
>
> Fixes: a6f7d2aff623 ("net/mlx5: Add support for multiple RoCE enable")

Dan, please send this to net as a fix

> Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]     ` <20171224125741.25464-3-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-12-24 22:00       ` Or Gerlitz
       [not found]         ` <CAJ3xEMiq-4DGFW-Z3hX3NfsSGXD6bm_uarGF_cm6K7+YuutJBQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2017-12-24 22:00 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Saeed Mahameed, Leon Romanovsky

On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Generate a unique 128bit identifier for each host and pass that value to
> firmware in the INIT_HCA command if it reports the sw_owner_id
> capo ability. This value is used by FW to determine if functions are in
> use by the same host.

"capo ability"?

did you want to say the same driver instance? b/c multiple instances
of the driver can run on a host


> Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/fw.c        | 10 +++++++++-
>  drivers/net/ethernet/mellanox/mlx5/core/main.c      |  6 +++++-
>  drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h |  2 +-
>  include/linux/mlx5/device.h                         |  5 +++++
>  include/linux/mlx5/mlx5_ifc.h                       |  5 ++++-
>  5 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
> index 5ef1b56b6a96..9d11e92fb541 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
> @@ -195,12 +195,20 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
>         return 0;
>  }
>
> -int mlx5_cmd_init_hca(struct mlx5_core_dev *dev)
> +int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id)
>  {
>         u32 out[MLX5_ST_SZ_DW(init_hca_out)] = {0};
>         u32 in[MLX5_ST_SZ_DW(init_hca_in)]   = {0};
> +       int i;
>
>         MLX5_SET(init_hca_in, in, opcode, MLX5_CMD_OP_INIT_HCA);
> +
> +       if (MLX5_CAP_GEN(dev, sw_owner_id)) {
> +               for (i = 0; i < 4; i++)
> +                       MLX5_ARRAY_SET(init_hca_in, in, sw_owner_id, i,
> +                                      sw_owner_id[i]);
> +       }
> +
>         return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
>  }
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 5f323442cc5a..5f3dc0ede917 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -75,6 +75,8 @@ static unsigned int prof_sel = MLX5_DEFAULT_PROF;
>  module_param_named(prof_sel, prof_sel, uint, 0444);
>  MODULE_PARM_DESC(prof_sel, "profile selector. Valid range 0 - 2");
>
> +static u32 sw_owner_id[4];
> +
>  enum {
>         MLX5_ATOMIC_REQ_MODE_BE = 0x0,
>         MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS = 0x1,
> @@ -1052,7 +1054,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>                 goto reclaim_boot_pages;
>         }
>
> -       err = mlx5_cmd_init_hca(dev);
> +       err = mlx5_cmd_init_hca(dev, sw_owner_id);
>         if (err) {
>                 dev_err(&pdev->dev, "init hca failed\n");
>                 goto err_pagealloc_stop;
> @@ -1574,6 +1576,8 @@ static int __init init(void)
>  {
>         int err;
>
> +       get_random_bytes(&sw_owner_id, sizeof(sw_owner_id));
> +
>         mlx5_core_verify_params();
>         mlx5_register_debugfs();
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> index ff4a0b889a6f..b05868728da7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
> @@ -86,7 +86,7 @@ enum {
>
>  int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
>  int mlx5_query_board_id(struct mlx5_core_dev *dev);
> -int mlx5_cmd_init_hca(struct mlx5_core_dev *dev);
> +int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id);
>  int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev);
>  int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev);
>  void mlx5_core_event(struct mlx5_core_dev *dev, enum mlx5_dev_event event,
> diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
> index 409ffb14298a..18c041966ab8 100644
> --- a/include/linux/mlx5/device.h
> +++ b/include/linux/mlx5/device.h
> @@ -79,6 +79,11 @@
>                      << __mlx5_dw_bit_off(typ, fld))); \
>  } while (0)
>
> +#define MLX5_ARRAY_SET(typ, p, fld, idx, v) do { \
> +       BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 32); \
> +       MLX5_SET(typ, p, fld[idx], v); \
> +} while (0)
> +

Parav, Dan, we I pointed to you during the dscp trust mode work, changes
to this area of the code should have the RB sig of Eli Cohen or Saeed if you
can't get Eli, please do that.


>  #define MLX5_SET_TO_ONES(typ, p, fld) do { \
>         BUILD_BUG_ON(__mlx5_st_sz_bits(typ) % 32);             \
>         *((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
> diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
> index 38a7577a9ce7..b1c81d7a86cb 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -1066,7 +1066,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
>         u8         reserved_at_5f8[0x3];
>         u8         log_max_xrq[0x5];
>
> -       u8         reserved_at_600[0x200];
> +       u8         reserved_at_600[0x1e];
> +       u8         sw_owner_id;
> +       u8         reserved_at_61f[0x1e1];
>  };
>
>  enum mlx5_flow_destination_type {
> @@ -5531,6 +5533,7 @@ struct mlx5_ifc_init_hca_in_bits {
>         u8         op_mod[0x10];
>
>         u8         reserved_at_40[0x40];
> +       u8         sw_owner_id[4][0x20];
>  };

can we do here just a plane addition of bits? don't have the code
infront of me, but seems suspicious
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 13/15] IB/mlx5: Use correct mdev for vport queries in ib_virt
       [not found]     ` <20171224125741.25464-14-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-12-24 22:14       ` Or Gerlitz
       [not found]         ` <CAJ3xEMgyD693JPtu_vag-q3kVP=Nyag=dPEp15AZcyX62FwkPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2017-12-24 22:14 UTC (permalink / raw)
  To: Daniel Jurgens, Eli Cohen
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Leon Romanovsky

On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> When using dual port RoCE mode vport queries should be routed to the
> respective port mlx5_core_dev instead of the IB devices master mdev.

isn't this code dealing only with IB (== Infiniband) virtualization?

what role the modified code has over Eth (== RoCE)?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found]     ` <20171224125741.25464-15-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2017-12-24 22:16       ` Or Gerlitz
       [not found]         ` <CAJ3xEMgAtd1PKN2h1NFj=Mt72msuU5bpFy0jO4KFXxXxGRf7hA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2017-12-24 22:16 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Leon Romanovsky

On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> When operating in dual port RoCE mode RAW QPs are not supposed
> to work on the slave port.

why?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]         ` <CAJ3xEMiq-4DGFW-Z3hX3NfsSGXD6bm_uarGF_cm6K7+YuutJBQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-25  5:35           ` Leon Romanovsky
       [not found]             ` <20171225053534.GT2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  2017-12-27 15:27           ` Daniel Jurgens
  1 sibling, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-25  5:35 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Daniel Jurgens, Doug Ledford, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Parav Pandit, Saeed Mahameed

[-- Attachment #1: Type: text/plain, Size: 810 bytes --]

On Mon, Dec 25, 2017 at 12:00:43AM +0200, Or Gerlitz wrote:
> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> >
> > +#define MLX5_ARRAY_SET(typ, p, fld, idx, v) do { \
> > +       BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 32); \
> > +       MLX5_SET(typ, p, fld[idx], v); \
> > +} while (0)
> > +
>
> Parav, Dan, we I pointed to you during the dscp trust mode work, changes
> to this area of the code should have the RB sig of Eli Cohen or Saeed if you
> can't get Eli, please do that.
>

Or,

This patch and other patches in this series were seen by Saeed during
preparation to shared code pull request. In addition, he was being added
as an reviewer for the months.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 01/15] net/mlx5: Fix race for multiple RoCE enable
       [not found]         ` <CAJ3xEMg8U_4DpYEWa8t1QpccNzjU6f0shEtQ7fRWg-EtGupy+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-25  5:59           ` Leon Romanovsky
       [not found]             ` <20171225055916.GU2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-25  5:59 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Daniel Jurgens, Doug Ledford, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]

On Sun, Dec 24, 2017 at 11:53:16PM +0200, Or Gerlitz wrote:
> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > There are two potential problems with the existing implementation.
> >
> > 1. Enable and disable can race after the atomic operations,
>
> s/,/./
>
> > 2. If a command fails the refcount is left in an inconsistent state.
> >
> > Introduce a lock and perform error checking.
> >
> > Fixes: a6f7d2aff623 ("net/mlx5: Add support for multiple RoCE enable")
>
> Dan, please send this to net as a fix
>

Thanks Or for the suggestion, but we will proceed with our initial plan
and continue to submit this as part of our original dual-port-roce
series.

Thanks

> > Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found]         ` <CAJ3xEMgAtd1PKN2h1NFj=Mt72msuU5bpFy0jO4KFXxXxGRf7hA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-25  7:16           ` Leon Romanovsky
       [not found]             ` <20171225071613.GW2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-25  7:16 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Daniel Jurgens, Doug Ledford, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

On Mon, Dec 25, 2017 at 12:16:15AM +0200, Or Gerlitz wrote:
> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > When operating in dual port RoCE mode RAW QPs are not supposed
> > to work on the slave port.
>
> why?

HW create flow table restriction.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 13/15] IB/mlx5: Use correct mdev for vport queries in ib_virt
       [not found]         ` <CAJ3xEMgyD693JPtu_vag-q3kVP=Nyag=dPEp15AZcyX62FwkPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-25  7:25           ` Leon Romanovsky
  0 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2017-12-25  7:25 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Daniel Jurgens, Eli Cohen, Doug Ledford, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On Mon, Dec 25, 2017 at 12:14:00AM +0200, Or Gerlitz wrote:
> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >
> > When using dual port RoCE mode vport queries should be routed to the
> > respective port mlx5_core_dev instead of the IB devices master mdev.
>
> isn't this code dealing only with IB (== Infiniband) virtualization?
>
> what role the modified code has over Eth (== RoCE)?

I think that you are right and it is not needed change.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found]             ` <20171225071613.GW2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-12-25  9:17               ` Or Gerlitz
       [not found]                 ` <CAJ3xEMhtssDWuu8JAB_jhyofKbkeo4vYTLixLeta6vPzKQOp+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2017-12-25  9:17 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Or Gerlitz, Daniel Jurgens, Doug Ledford, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Parav Pandit

On Mon, Dec 25, 2017 at 9:16 AM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On Mon, Dec 25, 2017 at 12:16:15AM +0200, Or Gerlitz wrote:
>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> > From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> >
>> > When operating in dual port RoCE mode RAW QPs are not supposed
>> > to work on the slave port.
>>
>> why?
>
> HW create flow table restriction.


cryptic to me, either put a proper explanation on the V2 change log
for this patch
so we can review it then or explain it here if you want a review on V1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 01/15] net/mlx5: Fix race for multiple RoCE enable
       [not found]             ` <20171225055916.GU2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-12-25  9:21               ` Or Gerlitz
  0 siblings, 0 replies; 47+ messages in thread
From: Or Gerlitz @ 2017-12-25  9:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Daniel Jurgens, Doug Ledford, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Parav Pandit

On Mon, Dec 25, 2017 at 7:59 AM, Leon Romanovsky wrote:
> On Sun, Dec 24, 2017 at 11:53:16PM +0200, Or Gerlitz wrote:

>> Dan, please send this to net as a fix

> we will proceed with our initial plan and continue to submit this as part of our original dual-port-roce
> series.

This is a patch to the core driver which is maintained @ netdev. The
netdev maintainer stated clearly
to us that fixes should go to net or to rdma-rc if you want to get
that through the rdma tree. Note that
you should not loose a cycle just b/c you stepped on a bug during a
feature development, unfortunately
it happens a lot... we sent a net patch and then rebase the net-next
submission on top of it, you can do
the same for rdma rc/next.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found]   ` <CAJ3xEMhZgEee+VLpV4bV150siOdXwpcp64AGqeqr5Y2o--WRdw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-25  9:24     ` Or Gerlitz
       [not found]       ` <CAJ3xEMgpP0Sbj4vY3_pJDjrDqHLHmkaTSGLyVuBY+aoC6VUnHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-12-27 15:22     ` Daniel Jurgens
  1 sibling, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2017-12-25  9:24 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Leon Romanovsky, Saeed Mahameed

On Sun, Dec 24, 2017 at 11:47 PM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky wrote:


>> SR-IOV devices follow the same pattern as the physical ones. VFs of a
>> master port can bind VFs of slave ports, if available, and operate as dual port
>> devices.


> We still have two mlx5 EN devices, right?! and from libvirt point of view
> business are just as usual,
> a VM will be assigned with one VF from any of the core devices (master only,
> slave only or both) and this will be an IB device with one port - correct?

what happens if the admin provisions two VF devices for each of the
following patterns:

(slave, slave)
(master, master)
(master, slave) <-- this one you were targeting
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found]   ` <CAJ3xEMhZgEee+VLpV4bV150siOdXwpcp64AGqeqr5Y2o--WRdw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-12-25  9:24     ` Or Gerlitz
@ 2017-12-27 15:22     ` Daniel Jurgens
       [not found]       ` <382ba516-bf7b-0a0b-7a9f-604cbf805c80-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 47+ messages in thread
From: Daniel Jurgens @ 2017-12-27 15:22 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Leon Romanovsky, Saeed Mahameed

On 12/24/2017 3:47 PM, Or Gerlitz wrote:
> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
>> From Daniel:
>>
>> This feature allows RDMA resources (pd, mr, cq, qp, etc) to be used with
>> both physical ports of capable mlx5 devices. When enabled a single IB
>> device with two ports will be registered instead of two single port
>> devices.
>>
>> There are still two PCI devices underlying the two port device, the
>> capabilities indicate which device is the "master" device and which is
>> the slave.
>>
>> When the add callback function is called for a slave device a list of IB
>> devices is searched for matching master device, indicated by the
>> capabilities
>> and the system_image_guid. If a match is found the slave is bound to the
>> master device, otherwise it's placed on a list, in case it's master becomes
>> available in the future. When a master device is added it searches the list
>> of available slaves for a matching slave device. If a match is found it
>> binds
>> the slave as its 2nd port. If no match as found the device still appears
>> as a dual port device, with the 2nd port down. RDMA resources can still
>> created that use the yet unavailable 2nd port.
>>
>> Commands related to IB resources are all routed through the master
>> mlx5_core
>> device. Port specific commands, like those for hardware counters are
>> routed to
>> their respective port mlx5_core device. Since devices can appear and
>> disappear
>> asynchronously a reference count on the underlying mlx5_core device is
>> maintained. Getting and putting this reference is only necessary for
>> commands
>> destined to a specific port, the master core device can be used freely,
>> as it will exist while the IB device exists.
>>
>>
> Daniel, so far you only talked on IB devices and a master/slave relation
> you introduce for the
> core devices
>
>
>
>> SR-IOV devices follow the same pattern as the physical ones. VFs of a
>> master
>> port can bind VFs of slave ports, if available, and operate as dual
>> port devices.
>>
> Not following.
>
> We still have two mlx5 EN devices, right?! and from libvirt point of view
> business are just as usual,
Yes, 2 netdevs.
> a VM will be assigned with one VF from any of the core devices (master
> only, slave only or both)
> and this will be an IB device with one port - correct?

No. If the PCI device passed to the VM is a master the VM will show an IB device with 2 ports, one of them down. If the PCI device were a slave there would be no IB device for it in the VM. If the VM is passed one master and one slave then there will be a 2 port IB device with both ports available.


> Note that most of us are ooo till early January (== early next week), so
> should be able to follow on your proposal then.
>
> Or.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]         ` <CAJ3xEMiq-4DGFW-Z3hX3NfsSGXD6bm_uarGF_cm6K7+YuutJBQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-12-25  5:35           ` Leon Romanovsky
@ 2017-12-27 15:27           ` Daniel Jurgens
       [not found]             ` <b329f348-2462-2bf4-ee06-064e798c9b86-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 47+ messages in thread
From: Daniel Jurgens @ 2017-12-27 15:27 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Saeed Mahameed, Leon Romanovsky

On 12/24/2017 4:00 PM, Or Gerlitz wrote:
> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Generate a unique 128bit identifier for each host and pass that value to
>> firmware in the INIT_HCA command if it reports the sw_owner_id
>> capo ability. This value is used by FW to determine if functions are in
>> use by the same host.
> "capo ability"?
>
> did you want to say the same driver instance? b/c multiple instances
> of the driver can run on a host
No, we want to set the same SW owner ID set for all devices for that host.  Not per driver instance.
>
>
>> Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Reviewed-by: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/fw.c        | 10 +++++++++-
>>  drivers/net/ethernet/mellanox/mlx5/core/main.c      |  6 +++++-
>>  drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h |  2 +-
>>  include/linux/mlx5/device.h                         |  5 +++++
>>  include/linux/mlx5/mlx5_ifc.h                       |  5 ++++-
>>  5 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
>> index 5ef1b56b6a96..9d11e92fb541 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
>> @@ -195,12 +195,20 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
>>         return 0;
>>  }
>>
>> -int mlx5_cmd_init_hca(struct mlx5_core_dev *dev)
>> +int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id)
>>  {
>>         u32 out[MLX5_ST_SZ_DW(init_hca_out)] = {0};
>>         u32 in[MLX5_ST_SZ_DW(init_hca_in)]   = {0};
>> +       int i;
>>
>>         MLX5_SET(init_hca_in, in, opcode, MLX5_CMD_OP_INIT_HCA);
>> +
>> +       if (MLX5_CAP_GEN(dev, sw_owner_id)) {
>> +               for (i = 0; i < 4; i++)
>> +                       MLX5_ARRAY_SET(init_hca_in, in, sw_owner_id, i,
>> +                                      sw_owner_id[i]);
>> +       }
>> +
>>         return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
>>  }
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> index 5f323442cc5a..5f3dc0ede917 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> @@ -75,6 +75,8 @@ static unsigned int prof_sel = MLX5_DEFAULT_PROF;
>>  module_param_named(prof_sel, prof_sel, uint, 0444);
>>  MODULE_PARM_DESC(prof_sel, "profile selector. Valid range 0 - 2");
>>
>> +static u32 sw_owner_id[4];
>> +
>>  enum {
>>         MLX5_ATOMIC_REQ_MODE_BE = 0x0,
>>         MLX5_ATOMIC_REQ_MODE_HOST_ENDIANNESS = 0x1,
>> @@ -1052,7 +1054,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
>>                 goto reclaim_boot_pages;
>>         }
>>
>> -       err = mlx5_cmd_init_hca(dev);
>> +       err = mlx5_cmd_init_hca(dev, sw_owner_id);
>>         if (err) {
>>                 dev_err(&pdev->dev, "init hca failed\n");
>>                 goto err_pagealloc_stop;
>> @@ -1574,6 +1576,8 @@ static int __init init(void)
>>  {
>>         int err;
>>
>> +       get_random_bytes(&sw_owner_id, sizeof(sw_owner_id));
>> +
>>         mlx5_core_verify_params();
>>         mlx5_register_debugfs();
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
>> index ff4a0b889a6f..b05868728da7 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
>> @@ -86,7 +86,7 @@ enum {
>>
>>  int mlx5_query_hca_caps(struct mlx5_core_dev *dev);
>>  int mlx5_query_board_id(struct mlx5_core_dev *dev);
>> -int mlx5_cmd_init_hca(struct mlx5_core_dev *dev);
>> +int mlx5_cmd_init_hca(struct mlx5_core_dev *dev, uint32_t *sw_owner_id);
>>  int mlx5_cmd_teardown_hca(struct mlx5_core_dev *dev);
>>  int mlx5_cmd_force_teardown_hca(struct mlx5_core_dev *dev);
>>  void mlx5_core_event(struct mlx5_core_dev *dev, enum mlx5_dev_event event,
>> diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
>> index 409ffb14298a..18c041966ab8 100644
>> --- a/include/linux/mlx5/device.h
>> +++ b/include/linux/mlx5/device.h
>> @@ -79,6 +79,11 @@
>>                      << __mlx5_dw_bit_off(typ, fld))); \
>>  } while (0)
>>
>> +#define MLX5_ARRAY_SET(typ, p, fld, idx, v) do { \
>> +       BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 32); \
>> +       MLX5_SET(typ, p, fld[idx], v); \
>> +} while (0)
>> +
> Parav, Dan, we I pointed to you during the dscp trust mode work, changes
> to this area of the code should have the RB sig of Eli Cohen or Saeed if you
> can't get Eli, please do that.
>
>
>>  #define MLX5_SET_TO_ONES(typ, p, fld) do { \
>>         BUILD_BUG_ON(__mlx5_st_sz_bits(typ) % 32);             \
>>         *((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
>> diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
>> index 38a7577a9ce7..b1c81d7a86cb 100644
>> --- a/include/linux/mlx5/mlx5_ifc.h
>> +++ b/include/linux/mlx5/mlx5_ifc.h
>> @@ -1066,7 +1066,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
>>         u8         reserved_at_5f8[0x3];
>>         u8         log_max_xrq[0x5];
>>
>> -       u8         reserved_at_600[0x200];
>> +       u8         reserved_at_600[0x1e];
>> +       u8         sw_owner_id;
>> +       u8         reserved_at_61f[0x1e1];
>>  };
>>
>>  enum mlx5_flow_destination_type {
>> @@ -5531,6 +5533,7 @@ struct mlx5_ifc_init_hca_in_bits {
>>         u8         op_mod[0x10];
>>
>>         u8         reserved_at_40[0x40];
>> +       u8         sw_owner_id[4][0x20];
>>  };
> can we do here just a plane addition of bits? don't have the code
> infront of me, but seems suspicious
I don't understand what you are asking.  This is what the tool generated.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]             ` <b329f348-2462-2bf4-ee06-064e798c9b86-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-12-27 17:52               ` Jason Gunthorpe
       [not found]                 ` <20171227175208.GD31310-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2018-01-02  9:34               ` Or Gerlitz
  1 sibling, 1 reply; 47+ messages in thread
From: Jason Gunthorpe @ 2017-12-27 17:52 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Saeed Mahameed, Leon Romanovsky

On Wed, Dec 27, 2017 at 09:27:20AM -0600, Daniel Jurgens wrote:
> On 12/24/2017 4:00 PM, Or Gerlitz wrote:
> > On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>
> >> Generate a unique 128bit identifier for each host and pass that value to
> >> firmware in the INIT_HCA command if it reports the sw_owner_id
> >> capo ability. This value is used by FW to determine if functions are in
> >> use by the same host.
> > "capo ability"?
> >
> > did you want to say the same driver instance? b/c multiple instances
> > of the driver can run on a host

> No, we want to set the same SW owner ID set for all devices for that
> host. Not per driver instance.

This commit message is confusing, can you explain it better please?

I'm guessing the owner_id is used to match individual port's PCI
functions?

So, any 'host' that can see multiple PCI function and share ports
should set the same ID?

What happens when a driver is detached? Eg I rebind the PCI function
to VFIO. Does something reset the ID?

Not entirely sure why this is being done at all, shouldn't the PCI
functions be rather un-ambiguous?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]                 ` <20171227175208.GD31310-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-12-27 18:06                   ` Daniel Jurgens
       [not found]                     ` <37588052-e7cd-b2ad-b2e0-91e03a9401d1-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Jurgens @ 2017-12-27 18:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Saeed Mahameed, Leon Romanovsky

On 12/27/2017 11:52 AM, Jason Gunthorpe wrote:
> On Wed, Dec 27, 2017 at 09:27:20AM -0600, Daniel Jurgens wrote:
>> On 12/24/2017 4:00 PM, Or Gerlitz wrote:
>>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>
>>>> Generate a unique 128bit identifier for each host and pass that value to
>>>> firmware in the INIT_HCA command if it reports the sw_owner_id
>>>> capo ability. This value is used by FW to determine if functions are in
>>>> use by the same host.
>>> "capo ability"?
>>>
>>> did you want to say the same driver instance? b/c multiple instances
>>> of the driver can run on a host
>> No, we want to set the same SW owner ID set for all devices for that
>> host. Not per driver instance.
> This commit message is confusing, can you explain it better please?
>
> I'm guessing the owner_id is used to match individual port's PCI
> functions?
>
> So, any 'host' that can see multiple PCI function and share ports
> should set the same ID?
Right, ports can only be bound if they have the same software owner ID set. If the ID is not the same the bind command will fail.
> What happens when a driver is detached? Eg I rebind the PCI function
> to VFIO. Does something reset the ID?
Yes, next time INIT_HCA is called the owner ID is set again.
> Not entirely sure why this is being done at all, shouldn't the PCI
> functions be rather un-ambiguous?
It's to prevent binding a device as a slave port if it's passed through to a different VM.
> Jason


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]                     ` <37588052-e7cd-b2ad-b2e0-91e03a9401d1-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-12-27 18:10                       ` Jason Gunthorpe
  0 siblings, 0 replies; 47+ messages in thread
From: Jason Gunthorpe @ 2017-12-27 18:10 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Or Gerlitz, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Saeed Mahameed, Leon Romanovsky

On Wed, Dec 27, 2017 at 12:06:53PM -0600, Daniel Jurgens wrote:

> > Not entirely sure why this is being done at all, shouldn't the PCI
> > functions be rather un-ambiguous?

> It's to prevent binding a device as a slave port if it's passed
> through to a different VM.

Okay. The commit message should discuss the failure mode that the
change is trying to prevent. (ie the 'why')

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found]       ` <382ba516-bf7b-0a0b-7a9f-604cbf805c80-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-12-27 21:22         ` Or Gerlitz
       [not found]           ` <CAJ3xEMi3CvATj-vpfy8E89=ZoPL1mmoiu8HWea_NEwfV2++ZpQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2017-12-27 21:22 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky, Saeed Mahameed

On Wed, Dec 27, 2017 at 5:22 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:

> If the PCI device were a slave there would be no IB device for it in the VM.

This creates a regression for users that update their kernel and now
their VMs stop working
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]             ` <20171225053534.GT2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-12-27 21:28               ` Or Gerlitz
  0 siblings, 0 replies; 47+ messages in thread
From: Or Gerlitz @ 2017-12-27 21:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Daniel Jurgens, Doug Ledford, Jason Gunthorpe, RDMA mailing list,
	Parav Pandit, Saeed Mahameed

On Mon, Dec 25, 2017 at 7:35 AM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On Mon, Dec 25, 2017 at 12:00:43AM +0200, Or Gerlitz wrote:
>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> > From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> >
>> >
>> > +#define MLX5_ARRAY_SET(typ, p, fld, idx, v) do { \
>> > +       BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 32); \
>> > +       MLX5_SET(typ, p, fld[idx], v); \
>> > +} while (0)
>> > +
>>
>> Parav, Dan, we I pointed to you during the dscp trust mode work, changes
>> to this area of the code should have the RB sig of Eli Cohen or Saeed if you
>> can't get Eli, please do that.

> This patch and other patches in this series were seen by Saeed during
> preparation to shared code pull request. In addition, he was being added
> as an reviewer for the months.

The upstream way to approach a reviewer which is not responsive enough
is to put a

Cc:

line for them in the signatures area of the change log

but you can simply give them a call or shout over the cubicle or near
the coffee machine,
it will just work, I can assure you.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found]           ` <CAJ3xEMi3CvATj-vpfy8E89=ZoPL1mmoiu8HWea_NEwfV2++ZpQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-27 23:20             ` Daniel Jurgens
       [not found]               ` <a3cf271c-f66c-074a-88bd-b48b39959e6b-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Jurgens @ 2017-12-27 23:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Parav Pandit,
	Leon Romanovsky, Saeed Mahameed

On 12/27/2017 3:22 PM, Or Gerlitz wrote:
> On Wed, Dec 27, 2017 at 5:22 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>
>> If the PCI device were a slave there would be no IB device for it in the VM.
> This creates a regression for users that update their kernel and now
> their VMs stop working

The feature is not enabled by default.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found]               ` <a3cf271c-f66c-074a-88bd-b48b39959e6b-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-12-29 18:03                 ` Jason Gunthorpe
       [not found]                   ` <20171229180313.GD6513-uk2M96/98Pc@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Jason Gunthorpe @ 2017-12-29 18:03 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Or Gerlitz, Doug Ledford, RDMA mailing list, Parav Pandit,
	Leon Romanovsky, Saeed Mahameed

On Wed, Dec 27, 2017 at 05:20:26PM -0600, Daniel Jurgens wrote:
> On 12/27/2017 3:22 PM, Or Gerlitz wrote:
> > On Wed, Dec 27, 2017 at 5:22 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> >
> >> If the PCI device were a slave there would be no IB device for it in the VM.
> > This creates a regression for users that update their kernel and now
> > their VMs stop working
> 
> The feature is not enabled by default.

You never described how it is enabled in the commit messages either..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found]                   ` <20171229180313.GD6513-uk2M96/98Pc@public.gmane.org>
@ 2017-12-29 20:52                     ` Daniel Jurgens
       [not found]                       ` <9d102fb2-122c-d7e9-2521-cf61b708d8c0-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Jurgens @ 2017-12-29 20:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Or Gerlitz, Doug Ledford, RDMA mailing list, Parav Pandit,
	Leon Romanovsky, Saeed Mahameed

On 12/29/2017 12:03 PM, Jason Gunthorpe wrote:
> On Wed, Dec 27, 2017 at 05:20:26PM -0600, Daniel Jurgens wrote:
>> On 12/27/2017 3:22 PM, Or Gerlitz wrote:
>>> On Wed, Dec 27, 2017 at 5:22 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>>> If the PCI device were a slave there would be no IB device for it in the VM.
>>> This creates a regression for users that update their kernel and now
>>> their VMs stop working
>> The feature is not enabled by default.
> You never described how it is enabled in the commit messages either..

It's enabled vi FW ini file/PSID or mlxconfig.  Do we ever describe that?

>
> Jason


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found]                       ` <9d102fb2-122c-d7e9-2521-cf61b708d8c0-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-12-29 21:27                         ` Jason Gunthorpe
  0 siblings, 0 replies; 47+ messages in thread
From: Jason Gunthorpe @ 2017-12-29 21:27 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Or Gerlitz, Doug Ledford, RDMA mailing list, Parav Pandit,
	Leon Romanovsky, Saeed Mahameed

On Fri, Dec 29, 2017 at 02:52:18PM -0600, Daniel Jurgens wrote:
> On 12/29/2017 12:03 PM, Jason Gunthorpe wrote:
> > On Wed, Dec 27, 2017 at 05:20:26PM -0600, Daniel Jurgens wrote:
> >> On 12/27/2017 3:22 PM, Or Gerlitz wrote:
> >>> On Wed, Dec 27, 2017 at 5:22 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> >>>
> >>>> If the PCI device were a slave there would be no IB device for it in the VM.
> >>> This creates a regression for users that update their kernel and now
> >>> their VMs stop working
> >> The feature is not enabled by default.
> > You never described how it is enabled in the commit messages either..
> 
> It's enabled vi FW ini file/PSID or mlxconfig.  Do we ever describe that?

If you add a new feature the commit message should briefly discussed
how it is used, yes. Just saying 'Dual port mode is enabled by
configuring the mlx5 firmware' is probably sufficient in this case.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]             ` <b329f348-2462-2bf4-ee06-064e798c9b86-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-12-27 17:52               ` Jason Gunthorpe
@ 2018-01-02  9:34               ` Or Gerlitz
       [not found]                 ` <CAJ3xEMixQ-Tw9gUYYPhNLvrEbntNTQBvLcVUe+x130hkEq15Bg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2018-01-02  9:34 UTC (permalink / raw)
  To: Daniel Jurgens, Parav Pandit; +Cc: RDMA mailing list, Saeed Mahameed

On Wed, Dec 27, 2017 at 5:27 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 12/24/2017 4:00 PM, Or Gerlitz wrote:

>>> @@ -5531,6 +5533,7 @@ struct mlx5_ifc_init_hca_in_bits {
>>>         u8         op_mod[0x10];
>>>
>>>         u8         reserved_at_40[0x40];
>>> +       u8         sw_owner_id[4][0x20];
>>>  };

>> can we do here just a plane addition of bits? don't have the code
>> infront of me, but seems suspicious

> I don't understand what you are asking.  This is what the tool generated.

Dan, the tool is serving us and not the other way around, so

before your changes we had:

struct mlx5_ifc_init_hca_in_bits {
        u8         opcode[0x10];
        u8         reserved_at_10[0x10];

        u8         reserved_at_20[0x10];
        u8         op_mod[0x10];

        u8         reserved_at_40[0x40];
};

and now we have a struct which is 128b bigger
but is used for the same FW command

struct mlx5_ifc_init_hca_in_bits {
        u8         opcode[0x10];
        u8         reserved_at_10[0x10];

        u8         reserved_at_20[0x10];
        u8         op_mod[0x10];

        u8         reserved_at_40[0x40];

        u8         sw_owner_id[16][0x8];
};

my question is if/why this is legal. Maybe I am wrong, but
I don't think we can just go and change the size of the inbox
payload for a command. Maybe there's another (silent) reserved
space which is not present in the original form of the struct?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA
       [not found]                 ` <CAJ3xEMixQ-Tw9gUYYPhNLvrEbntNTQBvLcVUe+x130hkEq15Bg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-02 10:10                   ` Leon Romanovsky
  0 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2018-01-02 10:10 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Daniel Jurgens, Parav Pandit, RDMA mailing list, Saeed Mahameed

[-- Attachment #1: Type: text/plain, Size: 1792 bytes --]

On Tue, Jan 02, 2018 at 11:34:03AM +0200, Or Gerlitz wrote:
> On Wed, Dec 27, 2017 at 5:27 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> > On 12/24/2017 4:00 PM, Or Gerlitz wrote:
>
> >>> @@ -5531,6 +5533,7 @@ struct mlx5_ifc_init_hca_in_bits {
> >>>         u8         op_mod[0x10];
> >>>
> >>>         u8         reserved_at_40[0x40];
> >>> +       u8         sw_owner_id[4][0x20];
> >>>  };
>
> >> can we do here just a plane addition of bits? don't have the code
> >> infront of me, but seems suspicious
>
> > I don't understand what you are asking.  This is what the tool generated.
>
> Dan, the tool is serving us and not the other way around, so
>
> before your changes we had:
>
> struct mlx5_ifc_init_hca_in_bits {
>         u8         opcode[0x10];
>         u8         reserved_at_10[0x10];
>
>         u8         reserved_at_20[0x10];
>         u8         op_mod[0x10];
>
>         u8         reserved_at_40[0x40];
> };
>
> and now we have a struct which is 128b bigger
> but is used for the same FW command
>
> struct mlx5_ifc_init_hca_in_bits {
>         u8         opcode[0x10];
>         u8         reserved_at_10[0x10];
>
>         u8         reserved_at_20[0x10];
>         u8         op_mod[0x10];
>
>         u8         reserved_at_40[0x40];
>
>         u8         sw_owner_id[16][0x8];
> };
>
> my question is if/why this is legal. Maybe I am wrong, but
> I don't think we can just go and change the size of the inbox
> payload for a command.

Yes, you are wrong. It is ok to change the payload.

Thanks

>
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE
       [not found]       ` <CAJ3xEMgpP0Sbj4vY3_pJDjrDqHLHmkaTSGLyVuBY+aoC6VUnHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-03 18:58         ` Daniel Jurgens
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Jurgens @ 2018-01-03 18:58 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Doug Ledford, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Parav Pandit, Leon Romanovsky, Saeed Mahameed

On 12/25/2017 3:24 AM, Or Gerlitz wrote:
> On Sun, Dec 24, 2017 at 11:47 PM, Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky wrote:
>
>>> SR-IOV devices follow the same pattern as the physical ones. VFs of a
>>> master port can bind VFs of slave ports, if available, and operate as dual port
>>> devices.
>
>> We still have two mlx5 EN devices, right?! and from libvirt point of view
>> business are just as usual,
>> a VM will be assigned with one VF from any of the core devices (master only,
>> slave only or both) and this will be an IB device with one port - correct?
> what happens if the admin provisions two VF devices for each of the
> following patterns:
>
> (slave, slave)
No IB devices for this pairing.

> (master, master)

2 IB devices, each showing 2 ports, but the slave port will be down.

> (master, slave) <-- this one you were targeting


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found]                 ` <CAJ3xEMhtssDWuu8JAB_jhyofKbkeo4vYTLixLeta6vPzKQOp+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-03 20:35                   ` Daniel Jurgens
       [not found]                     ` <ab3fb697-7b2c-685c-2151-5f48d5d7270a-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Jurgens @ 2018-01-03 20:35 UTC (permalink / raw)
  To: Or Gerlitz, Leon Romanovsky
  Cc: Or Gerlitz, Doug Ledford, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Parav Pandit

On 12/25/2017 3:17 AM, Or Gerlitz wrote:
> On Mon, Dec 25, 2017 at 9:16 AM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> On Mon, Dec 25, 2017 at 12:16:15AM +0200, Or Gerlitz wrote:
>>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>
>>>> When operating in dual port RoCE mode RAW QPs are not supposed
>>>> to work on the slave port.
>>> why?
>> HW create flow table restriction.
>
> cryptic to me, either put a proper explanation on the V2 change log
> for this patch
> so we can review it then or explain it here if you want a review on V1

We don't report RAW QP support on a per port basis.  It's per device.  The user doesn't know which port is the master/slave, so we shouldn't support RAW QP at all in this mode. I'll update the commit message to state that.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found]                     ` <ab3fb697-7b2c-685c-2151-5f48d5d7270a-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2018-01-03 20:39                       ` Or Gerlitz
       [not found]                         ` <CAJ3xEMimYj8=USqjHEF_uiwJCNoeT5qpbNMYTWUX7EgTVSeuuw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2018-01-03 20:39 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Leon Romanovsky, Or Gerlitz, Doug Ledford, Jason Gunthorpe,
	RDMA mailing list, Parav Pandit

On Wed, Jan 3, 2018 at 10:35 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 12/25/2017 3:17 AM, Or Gerlitz wrote:
>> On Mon, Dec 25, 2017 at 9:16 AM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>> On Mon, Dec 25, 2017 at 12:16:15AM +0200, Or Gerlitz wrote:
>>>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>>> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>
>>>>> When operating in dual port RoCE mode RAW QPs are not supposed
>>>>> to work on the slave port.
>>>> why?
>>> HW create flow table restriction.
>>
>> cryptic to me, either put a proper explanation on the V2 change log
>> for this patch
>> so we can review it then or explain it here if you want a review on V1
>
> We don't report RAW QP support on a per port basis.  It's per device.
> The user doesn't know which port is the master/slave, so we shouldn't
> support RAW QP at all in this mode. I'll update the commit message to state that.

in mlx4 we do have IB device with two ports and RAW QP is supported,
what is different there?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found]                         ` <CAJ3xEMimYj8=USqjHEF_uiwJCNoeT5qpbNMYTWUX7EgTVSeuuw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-03 20:43                           ` Daniel Jurgens
       [not found]                             ` <6f114e3c-2e16-f7ce-2e60-d06113afc9b8-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Daniel Jurgens @ 2018-01-03 20:43 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Leon Romanovsky, Or Gerlitz, Doug Ledford, Jason Gunthorpe,
	RDMA mailing list, Parav Pandit

On 1/3/2018 2:39 PM, Or Gerlitz wrote:
> On Wed, Jan 3, 2018 at 10:35 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> On 12/25/2017 3:17 AM, Or Gerlitz wrote:
>>> On Mon, Dec 25, 2017 at 9:16 AM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>> On Mon, Dec 25, 2017 at 12:16:15AM +0200, Or Gerlitz wrote:
>>>>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>>>> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>
>>>>>> When operating in dual port RoCE mode RAW QPs are not supposed
>>>>>> to work on the slave port.
>>>>> why?
>>>> HW create flow table restriction.
>>> cryptic to me, either put a proper explanation on the V2 change log
>>> for this patch
>>> so we can review it then or explain it here if you want a review on V1
>> We don't report RAW QP support on a per port basis.  It's per device.
>> The user doesn't know which port is the master/slave, so we shouldn't
>> support RAW QP at all in this mode. I'll update the commit message to state that.
> in mlx4 we do have IB device with two ports and RAW QP is supported,
> what is different there?

mlx4 has a completely different HW architecture.  It's 2 ports on 1 PCI function.  This feature doesn't do that for mlx5, there are still 2 PCI functions.  It uses steering rules to route RDMA traffic arriving at the 2nd port to resources in the domain of the 1st port. RDMA traffic is easy to identify and steer accordingly, generic traffic over a Raw QP is not.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found]                             ` <6f114e3c-2e16-f7ce-2e60-d06113afc9b8-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2018-01-03 20:47                               ` Or Gerlitz
       [not found]                                 ` <CAJ3xEMjZfTJP1fi=VosXL_v=AUueWZc54iOsXz1ynikYm-jcjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Or Gerlitz @ 2018-01-03 20:47 UTC (permalink / raw)
  To: Daniel Jurgens
  Cc: Leon Romanovsky, Or Gerlitz, Doug Ledford, Jason Gunthorpe,
	RDMA mailing list, Parav Pandit

On Wed, Jan 3, 2018 at 10:43 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 1/3/2018 2:39 PM, Or Gerlitz wrote:
>> On Wed, Jan 3, 2018 at 10:35 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>> On 12/25/2017 3:17 AM, Or Gerlitz wrote:
>>>> On Mon, Dec 25, 2017 at 9:16 AM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>>>> On Mon, Dec 25, 2017 at 12:16:15AM +0200, Or Gerlitz wrote:
>>>>>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>>>>> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>>>>>
>>>>>>> When operating in dual port RoCE mode RAW QPs are not supposed
>>>>>>> to work on the slave port.
>>>>>> why?
>>>>> HW create flow table restriction.
>>>> cryptic to me, either put a proper explanation on the V2 change log
>>>> for this patch
>>>> so we can review it then or explain it here if you want a review on V1
>>> We don't report RAW QP support on a per port basis.  It's per device.
>>> The user doesn't know which port is the master/slave, so we shouldn't
>>> support RAW QP at all in this mode. I'll update the commit message to state that.
>> in mlx4 we do have IB device with two ports and RAW QP is supported,
>> what is different there?
>
> mlx4 has a completely different HW architecture.  It's 2 ports on 1 PCI function.  This feature doesn't do that for mlx5, there are still 2 PCI functions.  It uses steering rules to route RDMA traffic arriving at the 2nd port to resources in the domain of the 1st port. RDMA traffic is easy to identify and steer accordingly, generic traffic over a Raw QP is not.

ok, so this is basically uneasy but not undoable. Aren't we creating
possible regressions for people that used RAW QP on mlx5 devices?
these
are open-systems and the code is out there for people to use it. Now
the systems that move to the new mode has inferior functionality.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode
       [not found]                                 ` <CAJ3xEMjZfTJP1fi=VosXL_v=AUueWZc54iOsXz1ynikYm-jcjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-04  5:21                                   ` Leon Romanovsky
  0 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2018-01-04  5:21 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Daniel Jurgens, Or Gerlitz, Doug Ledford, Jason Gunthorpe,
	RDMA mailing list, Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 2699 bytes --]

On Wed, Jan 03, 2018 at 10:47:14PM +0200, Or Gerlitz wrote:
> On Wed, Jan 3, 2018 at 10:43 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> > On 1/3/2018 2:39 PM, Or Gerlitz wrote:
> >> On Wed, Jan 3, 2018 at 10:35 PM, Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> >>> On 12/25/2017 3:17 AM, Or Gerlitz wrote:
> >>>> On Mon, Dec 25, 2017 at 9:16 AM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> >>>>> On Mon, Dec 25, 2017 at 12:16:15AM +0200, Or Gerlitz wrote:
> >>>>>> On Sun, Dec 24, 2017 at 2:57 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >>>>>>> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>>>>>>
> >>>>>>> When operating in dual port RoCE mode RAW QPs are not supposed
> >>>>>>> to work on the slave port.
> >>>>>> why?
> >>>>> HW create flow table restriction.
> >>>> cryptic to me, either put a proper explanation on the V2 change log
> >>>> for this patch
> >>>> so we can review it then or explain it here if you want a review on V1
> >>> We don't report RAW QP support on a per port basis.  It's per device.
> >>> The user doesn't know which port is the master/slave, so we shouldn't
> >>> support RAW QP at all in this mode. I'll update the commit message to state that.
> >> in mlx4 we do have IB device with two ports and RAW QP is supported,
> >> what is different there?
> >
> > mlx4 has a completely different HW architecture.  It's 2 ports on 1 PCI function.  This feature doesn't do that for mlx5, there are still 2 PCI functions.  It uses steering rules to route RDMA traffic arriving at the 2nd port to resources in the domain of the 1st port. RDMA traffic is easy to identify and steer accordingly, generic traffic over a Raw QP is not.
>
> ok, so this is basically uneasy but not undoable. Aren't we creating
> possible regressions for people that used RAW QP on mlx5 devices?
> these
> are open-systems and the code is out there for people to use it. Now
> the systems that move to the new mode has inferior functionality.

No, we are not creating regression mainly because of two reasons:
1. This dual-port functionality is enabled by special FW and not
available by default.
2. Programmers were supposed to query device capabilities and ensure
that RAW QP is supported prior to use it, like any other feature
advertised by those capabilities. This is the whole idea of device
capabilities.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2018-01-04  5:21 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-24 12:57 [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE Leon Romanovsky
     [not found] ` <20171224125741.25464-1-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-12-24 12:57   ` [PATCH rdma-next 01/15] net/mlx5: Fix race for multiple RoCE enable Leon Romanovsky
     [not found]     ` <20171224125741.25464-2-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-12-24 21:53       ` Or Gerlitz
     [not found]         ` <CAJ3xEMg8U_4DpYEWa8t1QpccNzjU6f0shEtQ7fRWg-EtGupy+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-25  5:59           ` Leon Romanovsky
     [not found]             ` <20171225055916.GU2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-25  9:21               ` Or Gerlitz
2017-12-24 12:57   ` [PATCH rdma-next 02/15] net/mlx5: Set software owner ID during init HCA Leon Romanovsky
     [not found]     ` <20171224125741.25464-3-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-12-24 22:00       ` Or Gerlitz
     [not found]         ` <CAJ3xEMiq-4DGFW-Z3hX3NfsSGXD6bm_uarGF_cm6K7+YuutJBQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-25  5:35           ` Leon Romanovsky
     [not found]             ` <20171225053534.GT2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-27 21:28               ` Or Gerlitz
2017-12-27 15:27           ` Daniel Jurgens
     [not found]             ` <b329f348-2462-2bf4-ee06-064e798c9b86-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-12-27 17:52               ` Jason Gunthorpe
     [not found]                 ` <20171227175208.GD31310-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-12-27 18:06                   ` Daniel Jurgens
     [not found]                     ` <37588052-e7cd-b2ad-b2e0-91e03a9401d1-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-12-27 18:10                       ` Jason Gunthorpe
2018-01-02  9:34               ` Or Gerlitz
     [not found]                 ` <CAJ3xEMixQ-Tw9gUYYPhNLvrEbntNTQBvLcVUe+x130hkEq15Bg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-02 10:10                   ` Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 03/15] IB/core: Change roce_rescan_device to return void Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 04/15] IB/mlx5: Reduce the use of num_port capability Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 05/15] IB/mlx5: Make netdev notifications multiport capable Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 06/15] {net,IB}/mlx5: Manage port association for multiport RoCE Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 07/15] IB/mlx5: Move IB event processing onto a workqueue Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 08/15] IB/mlx5: Implement dual port functionality in query routines Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 09/15] IB/mlx5: Change debugfs to have per port contents Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 10/15] IB/mlx5: Update counter implementation for dual port RoCE Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 11/15] {net,IB}/mlx5: Change set_roce_gid to take a port number Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 12/15] IB/mlx5: Route MADs for dual port RoCE Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 13/15] IB/mlx5: Use correct mdev for vport queries in ib_virt Leon Romanovsky
     [not found]     ` <20171224125741.25464-14-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-12-24 22:14       ` Or Gerlitz
     [not found]         ` <CAJ3xEMgyD693JPtu_vag-q3kVP=Nyag=dPEp15AZcyX62FwkPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-25  7:25           ` Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 14/15] IB/mlx5: Don't advertise RAW QP support in dual port mode Leon Romanovsky
     [not found]     ` <20171224125741.25464-15-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2017-12-24 22:16       ` Or Gerlitz
     [not found]         ` <CAJ3xEMgAtd1PKN2h1NFj=Mt72msuU5bpFy0jO4KFXxXxGRf7hA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-25  7:16           ` Leon Romanovsky
     [not found]             ` <20171225071613.GW2942-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-12-25  9:17               ` Or Gerlitz
     [not found]                 ` <CAJ3xEMhtssDWuu8JAB_jhyofKbkeo4vYTLixLeta6vPzKQOp+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-03 20:35                   ` Daniel Jurgens
     [not found]                     ` <ab3fb697-7b2c-685c-2151-5f48d5d7270a-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-03 20:39                       ` Or Gerlitz
     [not found]                         ` <CAJ3xEMimYj8=USqjHEF_uiwJCNoeT5qpbNMYTWUX7EgTVSeuuw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-03 20:43                           ` Daniel Jurgens
     [not found]                             ` <6f114e3c-2e16-f7ce-2e60-d06113afc9b8-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-01-03 20:47                               ` Or Gerlitz
     [not found]                                 ` <CAJ3xEMjZfTJP1fi=VosXL_v=AUueWZc54iOsXz1ynikYm-jcjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-04  5:21                                   ` Leon Romanovsky
2017-12-24 12:57   ` [PATCH rdma-next 15/15] net/mlx5: Set num_vhca_ports capability Leon Romanovsky
2017-12-24 21:48   ` [PATCH rdma-next 00/15] Dual Port mlx5 IB Device for RoCE Or Gerlitz
     [not found] ` <CAJ3xEMhZgEee+VLpV4bV150siOdXwpcp64AGqeqr5Y2o--WRdw@mail.gmail.com>
     [not found]   ` <CAJ3xEMhZgEee+VLpV4bV150siOdXwpcp64AGqeqr5Y2o--WRdw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-25  9:24     ` Or Gerlitz
     [not found]       ` <CAJ3xEMgpP0Sbj4vY3_pJDjrDqHLHmkaTSGLyVuBY+aoC6VUnHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-03 18:58         ` Daniel Jurgens
2017-12-27 15:22     ` Daniel Jurgens
     [not found]       ` <382ba516-bf7b-0a0b-7a9f-604cbf805c80-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-12-27 21:22         ` Or Gerlitz
     [not found]           ` <CAJ3xEMi3CvATj-vpfy8E89=ZoPL1mmoiu8HWea_NEwfV2++ZpQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-27 23:20             ` Daniel Jurgens
     [not found]               ` <a3cf271c-f66c-074a-88bd-b48b39959e6b-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-12-29 18:03                 ` Jason Gunthorpe
     [not found]                   ` <20171229180313.GD6513-uk2M96/98Pc@public.gmane.org>
2017-12-29 20:52                     ` Daniel Jurgens
     [not found]                       ` <9d102fb2-122c-d7e9-2521-cf61b708d8c0-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-12-29 21:27                         ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.