All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next 0/5] mlx5 IB RoCE LAG
@ 2016-09-18 10:08 Leon Romanovsky
       [not found] ` <1474193296-28062-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2016-09-18 10:08 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

This feature keeps track, using netdev LAG events,
of the bonding and link status of each port's PF netdev.

When both of the card's PF netdevs are enslaved exclusively
to the same bond master, LAG state is active.
Bond mode must be one of:
1 (active-backup), 2 (XOR), or 4 (802.3ad).

During LAG, a single IB device is present for both ports.
This allows for load balancing and high availability to be
managed by the driver, abstracting it away from ULPs.
Such an IB device is given a name of "mlx5_bond_X".

LAG mode is determined by the bond driver.
In load balancing (modes 2 and 4), QPs are assigned to ports
in a round-robin fashion, on QP transition from RESET->INIT.
In high availability (mode 1), all QPs are assigned to the active
slave, determined by bond driver.

Please note that if the link state of a port becomes down when
mode is load balancing, all QPs will be moved to the other port,
and will be moved back once both ports are up again.

This feature itself split into mlx5_core and mlx5_ib parts, while mlx5_core was
already submitted and accepted:
  1) mlx5_core: Implements RoCE LAG infrastructure and API.
    a) Bond device LAG events tracking.
    b) Decisions as to whether LAG state is active.
    c) Firmware commands to enter and exit LAG state.
    d) Manages HA behavior.
  2) mlx5_ib: Uses core infrastructure to implement RoCE LAG.
    a) Chooses an "mlx5_bond_X" device name when LAG is active.
    b) Implements QP load balancing.
    c) Merges steering, such that both port's traffic arrives on
       PF0's root flow table. This is used for processing
       both port's usermode Ethernet traffic on PF0.
    d) Creates a special flow table that diverts all
       non-usermode Ethernet traffic received on port 2
       back to PF1's root flow table (since non-usermode Ethernet
       should not be affected by LAG).

Available in the "topic/roce-lag-mlx5" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git

Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/roce-lag-mlx5

Thanks

Aviv Heller (5):
  IB/mlx5: Port events in RoCE now rely on netdev events
  IB/mlx5: Merge vports flow steering during LAG
  IB/mlx5: Port status track LAG master, when LAG is active
  IB/mlx5: Set unique device name on LAG
  IB/mlx5: LAG QP load balancing

 drivers/infiniband/hw/mlx5/main.c    | 150 +++++++++++++++++++++++++++++++----
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   2 +
 drivers/infiniband/hw/mlx5/qp.c      |  61 ++++++++++++--
 3 files changed, 191 insertions(+), 22 deletions(-)

--
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rdma-next 1/5] IB/mlx5: Port events in RoCE now rely on netdev events
       [not found] ` <1474193296-28062-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2016-09-18 10:08   ` Leon Romanovsky
  2016-09-18 10:08   ` [PATCH rdma-next 2/5] IB/mlx5: Merge vports flow steering during LAG Leon Romanovsky
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2016-09-18 10:08 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Aviv Heller

From: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Since ib_query_port() in RoCE returns the state of its netdev as the port
state, it makes sense to propagate the port up/down events to ib_core
when the netdev port state changes, instead of relying on traditional
core events.

This also keeps both the event and ib_query_port() synchronized.

Signed-off-by: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 66 ++++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 1b4094b..de7558b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -106,13 +106,32 @@ static int mlx5_netdev_event(struct notifier_block *this,
 	struct mlx5_ib_dev *ibdev = container_of(this, struct mlx5_ib_dev,
 						 roce.nb);
 
-	if ((event != NETDEV_UNREGISTER) && (event != NETDEV_REGISTER))
-		return NOTIFY_DONE;
+	switch (event) {
+	case NETDEV_REGISTER:
+	case NETDEV_UNREGISTER:
+		write_lock(&ibdev->roce.netdev_lock);
+		if (ndev->dev.parent == &ibdev->mdev->pdev->dev)
+			ibdev->roce.netdev = (event == NETDEV_UNREGISTER) ?
+					     NULL : ndev;
+		write_unlock(&ibdev->roce.netdev_lock);
+		break;
 
-	write_lock(&ibdev->roce.netdev_lock);
-	if (ndev->dev.parent == &ibdev->mdev->pdev->dev)
-		ibdev->roce.netdev = (event == NETDEV_UNREGISTER) ? NULL : ndev;
-	write_unlock(&ibdev->roce.netdev_lock);
+	case NETDEV_UP:
+	case NETDEV_DOWN:
+		if (ndev == ibdev->roce.netdev && ibdev->ib_active) {
+			struct ib_event ibev = {0};
+
+			ibev.device = &ibdev->ib_dev;
+			ibev.event = (event == NETDEV_UP) ?
+				     IB_EVENT_PORT_ACTIVE : IB_EVENT_PORT_ERR;
+			ibev.element.port_num = 1;
+			ib_dispatch_event(&ibev);
+		}
+		break;
+
+	default:
+		break;
+	}
 
 	return NOTIFY_DONE;
 }
@@ -2097,14 +2116,19 @@ static void mlx5_ib_event(struct mlx5_core_dev *dev, void *context,
 		break;
 
 	case MLX5_DEV_EVENT_PORT_UP:
-		ibev.event = IB_EVENT_PORT_ACTIVE;
-		port = (u8)param;
-		break;
-
 	case MLX5_DEV_EVENT_PORT_DOWN:
 	case MLX5_DEV_EVENT_PORT_INITIALIZED:
-		ibev.event = IB_EVENT_PORT_ERR;
 		port = (u8)param;
+
+		/* In RoCE, port up/down events are handled in
+		 * mlx5_netdev_event().
+		 */
+		if (mlx5_ib_port_link_layer(&ibdev->ib_dev, port) ==
+			IB_LINK_LAYER_ETHERNET)
+			return;
+
+		ibev.event = (event == MLX5_DEV_EVENT_PORT_UP) ?
+			     IB_EVENT_PORT_ACTIVE : IB_EVENT_PORT_ERR;
 		break;
 
 	case MLX5_DEV_EVENT_LID_CHANGE:
@@ -2509,14 +2533,24 @@ static void get_dev_fw_str(struct ib_device *ibdev, char *str,
 		       fw_rev_min(dev->mdev), fw_rev_sub(dev->mdev));
 }
 
+static void mlx5_remove_roce_notifier(struct mlx5_ib_dev *dev)
+{
+	if (dev->roce.nb.notifier_call) {
+		unregister_netdevice_notifier(&dev->roce.nb);
+		dev->roce.nb.notifier_call = NULL;
+	}
+}
+
 static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
 {
 	int err;
 
 	dev->roce.nb.notifier_call = mlx5_netdev_event;
 	err = register_netdevice_notifier(&dev->roce.nb);
-	if (err)
+	if (err) {
+		dev->roce.nb.notifier_call = NULL;
 		return err;
+	}
 
 	err = mlx5_nic_vport_enable_roce(dev->mdev);
 	if (err)
@@ -2525,14 +2559,13 @@ static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
 	return 0;
 
 err_unregister_netdevice_notifier:
-	unregister_netdevice_notifier(&dev->roce.nb);
+	mlx5_remove_roce_notifier(dev);
 	return err;
 }
 
 static void mlx5_disable_roce(struct mlx5_ib_dev *dev)
 {
 	mlx5_nic_vport_disable_roce(dev->mdev);
-	unregister_netdevice_notifier(&dev->roce.nb);
 }
 
 static void mlx5_ib_dealloc_q_counters(struct mlx5_ib_dev *dev)
@@ -2881,8 +2914,10 @@ err_rsrc:
 	destroy_dev_resources(&dev->devr);
 
 err_disable_roce:
-	if (ll == IB_LINK_LAYER_ETHERNET)
+	if (ll == IB_LINK_LAYER_ETHERNET) {
 		mlx5_disable_roce(dev);
+		mlx5_remove_roce_notifier(dev);
+	}
 
 err_free_port:
 	kfree(dev->port);
@@ -2898,6 +2933,7 @@ static void mlx5_ib_remove(struct mlx5_core_dev *mdev, void *context)
 	struct mlx5_ib_dev *dev = context;
 	enum rdma_link_layer ll = mlx5_ib_port_link_layer(&dev->ib_dev, 1);
 
+	mlx5_remove_roce_notifier(dev);
 	ib_unregister_device(&dev->ib_dev);
 	mlx5_ib_dealloc_q_counters(dev);
 	destroy_umrc_res(dev);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH rdma-next 2/5] IB/mlx5: Merge vports flow steering during LAG
       [not found] ` <1474193296-28062-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2016-09-18 10:08   ` [PATCH rdma-next 1/5] IB/mlx5: Port events in RoCE now rely on netdev events Leon Romanovsky
@ 2016-09-18 10:08   ` Leon Romanovsky
  2016-09-18 10:08   ` [PATCH rdma-next 3/5] IB/mlx5: Port status track LAG master, when LAG is active Leon Romanovsky
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2016-09-18 10:08 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Aviv Heller

From: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This is done in two steps:
1) Issuing CREATE_VPORT_LAG in order to have Ethernet traffic from
both ports arriving on PF0 root flowtable, so we will be able to catch
all raw-eth traffic on PF0.
2) Creation of LAG demux flowtable in order to direct all non-raw-eth
traffic back to its source port, assuring that normal Ethernet
traffic "jumps" to the root flowtable of its RX port (non-LAG behavior).

Signed-off-by: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    | 49 ++++++++++++++++++++++++++++++++++++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 2 files changed, 50 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index de7558b..ea766d3 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2533,6 +2533,47 @@ static void get_dev_fw_str(struct ib_device *ibdev, char *str,
 		       fw_rev_min(dev->mdev), fw_rev_sub(dev->mdev));
 }
 
+static int mlx5_roce_lag_init(struct mlx5_ib_dev *dev)
+{
+	struct mlx5_core_dev *mdev = dev->mdev;
+	struct mlx5_flow_namespace *ns = mlx5_get_flow_namespace(mdev,
+								 MLX5_FLOW_NAMESPACE_LAG);
+	struct mlx5_flow_table *ft;
+	int err;
+
+	if (!ns || !mlx5_lag_is_active(mdev))
+		return 0;
+
+	err = mlx5_cmd_create_vport_lag(mdev);
+	if (err)
+		return err;
+
+	ft = mlx5_create_lag_demux_flow_table(ns, 0, 0);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		goto err_destroy_vport_lag;
+	}
+
+	dev->flow_db.lag_demux_ft = ft;
+	return 0;
+
+err_destroy_vport_lag:
+	mlx5_cmd_destroy_vport_lag(mdev);
+	return err;
+}
+
+static void mlx5_roce_lag_cleanup(struct mlx5_ib_dev *dev)
+{
+	struct mlx5_core_dev *mdev = dev->mdev;
+
+	if (dev->flow_db.lag_demux_ft) {
+		mlx5_destroy_flow_table(dev->flow_db.lag_demux_ft);
+		dev->flow_db.lag_demux_ft = NULL;
+
+		mlx5_cmd_destroy_vport_lag(mdev);
+	}
+}
+
 static void mlx5_remove_roce_notifier(struct mlx5_ib_dev *dev)
 {
 	if (dev->roce.nb.notifier_call) {
@@ -2556,8 +2597,15 @@ static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
 	if (err)
 		goto err_unregister_netdevice_notifier;
 
+	err = mlx5_roce_lag_init(dev);
+	if (err)
+		goto err_disable_roce;
+
 	return 0;
 
+err_disable_roce:
+	mlx5_nic_vport_disable_roce(dev->mdev);
+
 err_unregister_netdevice_notifier:
 	mlx5_remove_roce_notifier(dev);
 	return err;
@@ -2565,6 +2613,7 @@ err_unregister_netdevice_notifier:
 
 static void mlx5_disable_roce(struct mlx5_ib_dev *dev)
 {
+	mlx5_roce_lag_cleanup(dev);
 	mlx5_nic_vport_disable_roce(dev->mdev);
 }
 
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 372385d..b13063a 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -156,6 +156,7 @@ struct mlx5_ib_flow_handler {
 
 struct mlx5_ib_flow_db {
 	struct mlx5_ib_flow_prio	prios[MLX5_IB_NUM_FLOW_FT];
+	struct mlx5_flow_table		*lag_demux_ft;
 	/* Protect flow steering bypass flow tables
 	 * when add/del flow rules.
 	 * only single add/removal of flow steering rule could be done
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH rdma-next 3/5] IB/mlx5: Port status track LAG master, when LAG is active
       [not found] ` <1474193296-28062-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  2016-09-18 10:08   ` [PATCH rdma-next 1/5] IB/mlx5: Port events in RoCE now rely on netdev events Leon Romanovsky
  2016-09-18 10:08   ` [PATCH rdma-next 2/5] IB/mlx5: Merge vports flow steering during LAG Leon Romanovsky
@ 2016-09-18 10:08   ` Leon Romanovsky
  2016-09-18 10:08   ` [PATCH rdma-next 4/5] IB/mlx5: Set unique device name on LAG Leon Romanovsky
  2016-09-18 10:08   ` [PATCH rdma-next 5/5] IB/mlx5: LAG QP load balancing Leon Romanovsky
  4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2016-09-18 10:08 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Aviv Heller

From: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When LAG is active, port up/down events should be triggered
by tracking the LAG master, and not one of the two slave
netdevs.

In the same manner, ib_query_port() should return the details
of the LAG master.

Signed-off-by: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index ea766d3..1373650 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -117,8 +117,17 @@ static int mlx5_netdev_event(struct notifier_block *this,
 		break;
 
 	case NETDEV_UP:
-	case NETDEV_DOWN:
-		if (ndev == ibdev->roce.netdev && ibdev->ib_active) {
+	case NETDEV_DOWN: {
+		struct net_device *lag_ndev = mlx5_lag_get_roce_netdev(ibdev->mdev);
+		struct net_device *upper = NULL;
+
+		if (lag_ndev) {
+			upper = netdev_master_upper_dev_get(lag_ndev);
+			dev_put(lag_ndev);
+		}
+
+		if ((upper == ndev || (!upper && ndev == ibdev->roce.netdev))
+		    && ibdev->ib_active) {
 			struct ib_event ibev = {0};
 
 			ibev.device = &ibdev->ib_dev;
@@ -128,6 +137,7 @@ static int mlx5_netdev_event(struct notifier_block *this,
 			ib_dispatch_event(&ibev);
 		}
 		break;
+	}
 
 	default:
 		break;
@@ -142,6 +152,10 @@ static struct net_device *mlx5_ib_get_netdev(struct ib_device *device,
 	struct mlx5_ib_dev *ibdev = to_mdev(device);
 	struct net_device *ndev;
 
+	ndev = mlx5_lag_get_roce_netdev(ibdev->mdev);
+	if (ndev)
+		return ndev;
+
 	/* Ensure ndev does not disappear before we invoke dev_hold()
 	 */
 	read_lock(&ibdev->roce.netdev_lock);
@@ -157,7 +171,7 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 				struct ib_port_attr *props)
 {
 	struct mlx5_ib_dev *dev = to_mdev(device);
-	struct net_device *ndev;
+	struct net_device *ndev, *upper;
 	enum ib_mtu ndev_ib_mtu;
 	u16 qkey_viol_cntr;
 
@@ -181,6 +195,17 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 	if (!ndev)
 		return 0;
 
+	if (mlx5_lag_is_active(dev->mdev)) {
+		rcu_read_lock();
+		upper = netdev_master_upper_dev_get_rcu(ndev);
+		if (upper) {
+			dev_put(ndev);
+			ndev = upper;
+			dev_hold(ndev);
+		}
+		rcu_read_unlock();
+	}
+
 	if (netif_running(ndev) && netif_carrier_ok(ndev)) {
 		props->state      = IB_PORT_ACTIVE;
 		props->phys_state = 5;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH rdma-next 4/5] IB/mlx5: Set unique device name on LAG
       [not found] ` <1474193296-28062-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-09-18 10:08   ` [PATCH rdma-next 3/5] IB/mlx5: Port status track LAG master, when LAG is active Leon Romanovsky
@ 2016-09-18 10:08   ` Leon Romanovsky
  2016-09-18 10:08   ` [PATCH rdma-next 5/5] IB/mlx5: LAG QP load balancing Leon Romanovsky
  4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2016-09-18 10:08 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Aviv Heller

From: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

IB bond device name is now 'mlx5_bond_X', instead of
'mlx5_X'.

Signed-off-by: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 1373650..c46884b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2754,6 +2754,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	struct mlx5_ib_dev *dev;
 	enum rdma_link_layer ll;
 	int port_type_cap;
+	const char *name;
 	int err;
 	int i;
 
@@ -2786,7 +2787,12 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 
 	MLX5_INIT_DOORBELL_LOCK(&dev->uar_lock);
 
-	strlcpy(dev->ib_dev.name, "mlx5_%d", IB_DEVICE_NAME_MAX);
+	if (!mlx5_lag_is_active(mdev))
+		name = "mlx5_%d";
+	else
+		name = "mlx5_bond_%d";
+
+	strlcpy(dev->ib_dev.name, name, IB_DEVICE_NAME_MAX);
 	dev->ib_dev.owner		= THIS_MODULE;
 	dev->ib_dev.node_type		= RDMA_NODE_IB_CA;
 	dev->ib_dev.local_dma_lkey	= 0 /* not supported for now */;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH rdma-next 5/5] IB/mlx5: LAG QP load balancing
       [not found] ` <1474193296-28062-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-09-18 10:08   ` [PATCH rdma-next 4/5] IB/mlx5: Set unique device name on LAG Leon Romanovsky
@ 2016-09-18 10:08   ` Leon Romanovsky
  4 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2016-09-18 10:08 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Aviv Heller

From: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When LAG is active, QP tx affinity (the physical port
to which a QP is affined, or the TIS in case of raw-eth)
is set in a round robin fashion during state transition
from RESET to INIT.

Signed-off-by: Aviv Heller <avivh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 drivers/infiniband/hw/mlx5/qp.c      | 61 +++++++++++++++++++++++++++++++++---
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index b13063a..27894e7 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -603,6 +603,7 @@ struct mlx5_roce {
 	rwlock_t		netdev_lock;
 	struct net_device	*netdev;
 	struct notifier_block	nb;
+	atomic_t		next_port;
 };
 
 struct mlx5_ib_dev {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 0dd7d93..4616dac 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1854,7 +1854,7 @@ static void get_cqs(enum ib_qp_type qp_type,
 }
 
 static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
-				u16 operation);
+				u16 operation, u8 lag_tx_affinity);
 
 static void destroy_qp_common(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp)
 {
@@ -1885,7 +1885,7 @@ static void destroy_qp_common(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp)
 						  &base->mqp);
 		} else {
 			err = modify_raw_packet_qp(dev, qp,
-						   MLX5_CMD_OP_2RST_QP);
+						   MLX5_CMD_OP_2RST_QP, 0);
 		}
 		if (err)
 			mlx5_ib_warn(dev, "mlx5_ib: modify QP 0x%06x to RESET failed\n",
@@ -2151,6 +2151,31 @@ static int modify_raw_packet_eth_prio(struct mlx5_core_dev *dev,
 	return err;
 }
 
+static int modify_raw_packet_tx_affinity(struct mlx5_core_dev *dev,
+					 struct mlx5_ib_sq *sq, u8 tx_affinity)
+{
+	void *in;
+	void *tisc;
+	int inlen;
+	int err;
+
+	inlen = MLX5_ST_SZ_BYTES(modify_tis_in);
+	in = mlx5_vzalloc(inlen);
+	if (!in)
+		return -ENOMEM;
+
+	MLX5_SET(modify_tis_in, in, bitmask.lag_tx_port_affinity, 1);
+
+	tisc = MLX5_ADDR_OF(modify_tis_in, in, ctx);
+	MLX5_SET(tisc, tisc, lag_tx_port_affinity, tx_affinity);
+
+	err = mlx5_core_modify_tis(dev, sq->tisn, in, inlen);
+
+	kvfree(in);
+
+	return err;
+}
+
 static int mlx5_set_path(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 			 const struct ib_ah_attr *ah,
 			 struct mlx5_qp_path *path, u8 port, int attr_mask,
@@ -2420,7 +2445,7 @@ out:
 }
 
 static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
-				u16 operation)
+				u16 operation, u8 tx_affinity)
 {
 	struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp;
 	struct mlx5_ib_rq *rq = &raw_packet_qp->rq;
@@ -2459,8 +2484,16 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 			return err;
 	}
 
-	if (qp->sq.wqe_cnt)
+	if (qp->sq.wqe_cnt) {
+		if (tx_affinity) {
+			err = modify_raw_packet_tx_affinity(dev->mdev, sq,
+							    tx_affinity);
+			if (err)
+				return err;
+		}
+
 		return modify_raw_packet_qp_sq(dev->mdev, sq, sq_state);
+	}
 
 	return 0;
 }
@@ -2519,6 +2552,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 	int mlx5_st;
 	int err;
 	u16 op;
+	u8 tx_affinity = 0;
 
 	in = kzalloc(sizeof(*in), GFP_KERNEL);
 	if (!in)
@@ -2549,6 +2583,23 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 		}
 	}
 
+	if ((cur_state == IB_QPS_RESET) && (new_state == IB_QPS_INIT)) {
+		if ((ibqp->qp_type == IB_QPT_RC) ||
+		    (ibqp->qp_type == IB_QPT_UD &&
+		     !(qp->flags & MLX5_IB_QP_SQPN_QP1)) ||
+		    (ibqp->qp_type == IB_QPT_UC) ||
+		    (ibqp->qp_type == IB_QPT_RAW_PACKET) ||
+		    (ibqp->qp_type == IB_QPT_XRC_INI) ||
+		    (ibqp->qp_type == IB_QPT_XRC_TGT)) {
+			if (mlx5_lag_is_active(dev->mdev)) {
+				tx_affinity = (unsigned int)atomic_add_return(1,
+						&dev->roce.next_port) %
+						MLX5_MAX_PORTS + 1;
+				context->flags |= cpu_to_be32(tx_affinity << 24);
+			}
+		}
+	}
+
 	if (is_sqp(ibqp->qp_type)) {
 		context->mtu_msgmax = (IB_MTU_256 << 5) | 8;
 	} else if (ibqp->qp_type == IB_QPT_UD ||
@@ -2692,7 +2743,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 	in->optparam = cpu_to_be32(optpar);
 
 	if (qp->ibqp.qp_type == IB_QPT_RAW_PACKET)
-		err = modify_raw_packet_qp(dev, qp, op);
+		err = modify_raw_packet_qp(dev, qp, op, tx_affinity);
 	else
 		err = mlx5_core_qp_modify(dev->mdev, op, in, sqd_event,
 					  &base->mqp);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-18 10:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-18 10:08 [PATCH rdma-next 0/5] mlx5 IB RoCE LAG Leon Romanovsky
     [not found] ` <1474193296-28062-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2016-09-18 10:08   ` [PATCH rdma-next 1/5] IB/mlx5: Port events in RoCE now rely on netdev events Leon Romanovsky
2016-09-18 10:08   ` [PATCH rdma-next 2/5] IB/mlx5: Merge vports flow steering during LAG Leon Romanovsky
2016-09-18 10:08   ` [PATCH rdma-next 3/5] IB/mlx5: Port status track LAG master, when LAG is active Leon Romanovsky
2016-09-18 10:08   ` [PATCH rdma-next 4/5] IB/mlx5: Set unique device name on LAG Leon Romanovsky
2016-09-18 10:08   ` [PATCH rdma-next 5/5] IB/mlx5: LAG QP load balancing Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.