All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 mlx5-next 00/10] Add support to get xmit slave
@ 2020-04-20  7:54 Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get Maor Gottlieb
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

Hi Dave,

This series is a combination of netdev and RDMA, so in order to avoid
conflicts, we would like to ask you to route this series through
mlx5-next shared branch. It is based on v5.7-rc1 tag.

---------------------------------------------------------------------

The following series adds support to get the LAG master xmit slave by
introducing new .ndo - ndo_xmit_slave_get. Every LAG module can
implement it and it first implemented in the bond driver. 
This is follow-up to the RFC discussion [1].

The main motivation for doing this is for drivers that offload part
of the LAG functionality. For example, Mellanox Connect-X hardware
implements RoCE LAG which selects the TX affinity when the resources
are created and port is remapped when it goes down.

The first part of this patchset introduces the new .ndo and add the
support to the bonding module.

The second part adds support to get the RoCE LAG xmit slave by building
skb of the RoCE packet based on the AH attributes and call to the new .ndo.

The third part change the mlx5 driver driver to set the QP's affinity
port according to the slave which found by the .ndo.

Thanks

[1] https://lore.kernel.org/netdev/20200126132126.9981-1-maorg@mellanox.com/

Change log:
v2: The first patch wasn't sent in v1.
v1: https://lore.kernel.org/netdev/ac373456-b838-29cf-645f-b1ea1a93e3b0@gmail.com/T/#t 

Maor Gottlieb (10):
  net/core: Introduce master_xmit_slave_get
  bonding: Rename slave_arr to usable_slaves
  bonding: Add helpers to get xmit slave
  bonding: Implement ndo_xmit_slave_get
  RDMA/core: Add LAG functionality
  RDMA/core: Get xmit slave for LAG
  net/mlx5: Change lag mutex lock to spin lock
  net/mlx5: Add support to get lag physical port
  RDMA/mlx5: Refactor affinity related code
  RDMA/mlx5: Set lag tx affinity according to slave

 drivers/infiniband/core/Makefile              |   2 +-
 drivers/infiniband/core/lag.c                 | 139 +++++++++
 drivers/infiniband/core/verbs.c               |  44 ++-
 drivers/infiniband/hw/mlx5/ah.c               |   4 +
 drivers/infiniband/hw/mlx5/gsi.c              |  34 ++-
 drivers/infiniband/hw/mlx5/main.c             |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   1 +
 drivers/infiniband/hw/mlx5/qp.c               | 123 +++++---
 drivers/net/bonding/bond_alb.c                |  39 ++-
 drivers/net/bonding/bond_main.c               | 272 +++++++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/lag.c |  66 +++--
 include/linux/mlx5/driver.h                   |   2 +
 include/linux/mlx5/mlx5_ifc.h                 |   4 +-
 include/linux/mlx5/qp.h                       |   2 +
 include/linux/netdevice.h                     |   3 +
 include/net/bond_alb.h                        |   4 +
 include/net/bonding.h                         |   3 +-
 include/net/lag.h                             |  32 +++
 include/rdma/ib_verbs.h                       |   2 +
 include/rdma/lag.h                            |  22 ++
 20 files changed, 621 insertions(+), 179 deletions(-)
 create mode 100644 drivers/infiniband/core/lag.c
 create mode 100644 include/rdma/lag.h

-- 
2.17.2


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20 14:01   ` Jiri Pirko
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 02/10] bonding: Rename slave_arr to usable_slaves Maor Gottlieb
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

Add new ndo to get the xmit slave of master device.
User should release the slave when it's not longer needed.
When slave selection method is based on hash, then the user can ask to
get the xmit slave assume all the slaves can transmit by setting the
LAG_FLAGS_HASH_ALL_SLAVES bit in the flags argument.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 include/linux/netdevice.h |  3 +++
 include/net/lag.h         | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 130a668049ab..e8852f3ad0b6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1389,6 +1389,9 @@ struct net_device_ops {
 						 struct netlink_ext_ack *extack);
 	int			(*ndo_del_slave)(struct net_device *dev,
 						 struct net_device *slave_dev);
+	struct net_device*	(*ndo_xmit_get_slave)(struct net_device *master_dev,
+						      struct sk_buff *skb,
+						      u16 flags);
 	netdev_features_t	(*ndo_fix_features)(struct net_device *dev,
 						    netdev_features_t features);
 	int			(*ndo_set_features)(struct net_device *dev,
diff --git a/include/net/lag.h b/include/net/lag.h
index 95b880e6fdde..c43b035989c4 100644
--- a/include/net/lag.h
+++ b/include/net/lag.h
@@ -6,6 +6,38 @@
 #include <linux/if_team.h>
 #include <net/bonding.h>
 
+enum lag_get_slaves_flags {
+	LAG_FLAGS_HASH_ALL_SLAVES = 1<<0
+};
+
+/**
+ * master_xmit_slave_get - Get the xmit slave of master device
+ * @skb: The packet
+ * @flags: lag_get_slaves_flags
+ *
+ * This can be called from any context and does its own locking.
+ * The returned handle has the usage count incremented and the caller must
+ * use dev_put() to release it when it is no longer needed.
+ * %NULL is returned if no slave is found.
+ */
+
+static inline
+struct net_device *master_xmit_get_slave(struct net_device *master_dev,
+					 struct sk_buff *skb,
+					 u16 flags)
+{
+	const struct net_device_ops *ops = master_dev->netdev_ops;
+	struct net_device *slave = NULL;
+
+	rcu_read_lock();
+	if (ops->ndo_xmit_get_slave)
+		slave = ops->ndo_xmit_get_slave(master_dev, skb, flags);
+	if (slave)
+		dev_hold(slave);
+	rcu_read_unlock();
+	return slave;
+}
+
 static inline bool net_lag_port_dev_txable(const struct net_device *port_dev)
 {
 	if (netif_is_team_port(port_dev))
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 02/10] bonding: Rename slave_arr to usable_slaves
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20 14:17   ` Jiri Pirko
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 03/10] bonding: Add helpers to get xmit slave Maor Gottlieb
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

This patch renames slave_arr to usable_slaves, since we will
have two arrays, one for the usable slaves and the other to all
slaves. In addition, exports the bond_skip_slave logic to function.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/net/bonding/bond_alb.c  |  4 +-
 drivers/net/bonding/bond_main.c | 85 +++++++++++++++++----------------
 include/net/bonding.h           |  2 +-
 3 files changed, 48 insertions(+), 43 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c81698550e5a..7bb49b049dcc 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1360,7 +1360,7 @@ netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 				struct bond_up_slave *slaves;
 				unsigned int count;
 
-				slaves = rcu_dereference(bond->slave_arr);
+				slaves = rcu_dereference(bond->usable_slaves);
 				count = slaves ? READ_ONCE(slaves->count) : 0;
 				if (likely(count))
 					tx_slave = slaves->arr[hash_index %
@@ -1494,7 +1494,7 @@ netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 			struct bond_up_slave *slaves;
 			unsigned int count;
 
-			slaves = rcu_dereference(bond->slave_arr);
+			slaves = rcu_dereference(bond->usable_slaves);
 			count = slaves ? READ_ONCE(slaves->count) : 0;
 			if (likely(count))
 				tx_slave = slaves->arr[bond_xmit_hash(bond, skb) %
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2e70e43c5df5..2cb41d480ae2 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4087,6 +4087,29 @@ static void bond_slave_arr_handler(struct work_struct *work)
 	bond_slave_arr_work_rearm(bond, 1);
 }
 
+static void bond_skip_slave(struct bond_up_slave *slaves,
+			    struct slave *skipslave)
+{
+	int idx;
+
+	/* Rare situation where caller has asked to skip a specific
+	 * slave but allocation failed (most likely!). BTW this is
+	 * only possible when the call is initiated from
+	 * __bond_release_one(). In this situation; overwrite the
+	 * skipslave entry in the array with the last entry from the
+	 * array to avoid a situation where the xmit path may choose
+	 * this to-be-skipped slave to send a packet out.
+	 */
+	for (idx = 0; slaves && idx < slaves->count; idx++) {
+		if (skipslave == slaves->arr[idx]) {
+			slaves->arr[idx] =
+				slaves->arr[slaves->count - 1];
+			slaves->count--;
+			break;
+		}
+	}
+}
+
 /* Build the usable slaves array in control path for modes that use xmit-hash
  * to determine the slave interface -
  * (a) BOND_MODE_8023AD
@@ -4097,9 +4120,9 @@ static void bond_slave_arr_handler(struct work_struct *work)
  */
 int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 {
+	struct bond_up_slave *usable_slaves, *old_usable_slaves;
 	struct slave *slave;
 	struct list_head *iter;
-	struct bond_up_slave *new_arr, *old_arr;
 	int agg_id = 0;
 	int ret = 0;
 
@@ -4107,11 +4130,10 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 	WARN_ON(lockdep_is_held(&bond->mode_lock));
 #endif
 
-	new_arr = kzalloc(offsetof(struct bond_up_slave, arr[bond->slave_cnt]),
-			  GFP_KERNEL);
-	if (!new_arr) {
+	usable_slaves = kzalloc(struct_size(usable_slaves, arr,
+					    bond->slave_cnt), GFP_KERNEL);
+	if (!usable_slaves) {
 		ret = -ENOMEM;
-		pr_err("Failed to build slave-array.\n");
 		goto out;
 	}
 	if (BOND_MODE(bond) == BOND_MODE_8023AD) {
@@ -4119,14 +4141,14 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 
 		if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
 			pr_debug("bond_3ad_get_active_agg_info failed\n");
-			kfree_rcu(new_arr, rcu);
+			kfree_rcu(usable_slaves, rcu);
 			/* No active aggragator means it's not safe to use
 			 * the previous array.
 			 */
-			old_arr = rtnl_dereference(bond->slave_arr);
-			if (old_arr) {
-				RCU_INIT_POINTER(bond->slave_arr, NULL);
-				kfree_rcu(old_arr, rcu);
+			old_usable_slaves = rtnl_dereference(bond->usable_slaves);
+			if (old_usable_slaves) {
+				RCU_INIT_POINTER(bond->usable_slaves, NULL);
+				kfree_rcu(old_usable_slaves, rcu);
 			}
 			goto out;
 		}
@@ -4146,37 +4168,20 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 			continue;
 
 		slave_dbg(bond->dev, slave->dev, "Adding slave to tx hash array[%d]\n",
-			  new_arr->count);
+			  usable_slaves->count);
 
-		new_arr->arr[new_arr->count++] = slave;
+		usable_slaves->arr[usable_slaves->count++] = slave;
 	}
 
-	old_arr = rtnl_dereference(bond->slave_arr);
-	rcu_assign_pointer(bond->slave_arr, new_arr);
-	if (old_arr)
-		kfree_rcu(old_arr, rcu);
+	old_usable_slaves = rtnl_dereference(bond->usable_slaves);
+	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
+	if (old_usable_slaves)
+		kfree_rcu(old_usable_slaves, rcu);
 out:
-	if (ret != 0 && skipslave) {
-		int idx;
-
-		/* Rare situation where caller has asked to skip a specific
-		 * slave but allocation failed (most likely!). BTW this is
-		 * only possible when the call is initiated from
-		 * __bond_release_one(). In this situation; overwrite the
-		 * skipslave entry in the array with the last entry from the
-		 * array to avoid a situation where the xmit path may choose
-		 * this to-be-skipped slave to send a packet out.
-		 */
-		old_arr = rtnl_dereference(bond->slave_arr);
-		for (idx = 0; old_arr != NULL && idx < old_arr->count; idx++) {
-			if (skipslave == old_arr->arr[idx]) {
-				old_arr->arr[idx] =
-				    old_arr->arr[old_arr->count-1];
-				old_arr->count--;
-				break;
-			}
-		}
-	}
+	if (ret != 0 && skipslave)
+		bond_skip_slave(rtnl_dereference(bond->usable_slaves),
+				skipslave);
+
 	return ret;
 }
 
@@ -4192,7 +4197,7 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
 	struct bond_up_slave *slaves;
 	unsigned int count;
 
-	slaves = rcu_dereference(bond->slave_arr);
+	slaves = rcu_dereference(bond->usable_slaves);
 	count = slaves ? READ_ONCE(slaves->count) : 0;
 	if (likely(count)) {
 		slave = slaves->arr[bond_xmit_hash(bond, skb) % count];
@@ -4483,9 +4488,9 @@ static void bond_uninit(struct net_device *bond_dev)
 		__bond_release_one(bond_dev, slave->dev, true, true);
 	netdev_info(bond_dev, "Released all slaves\n");
 
-	arr = rtnl_dereference(bond->slave_arr);
+	arr = rtnl_dereference(bond->usable_slaves);
 	if (arr) {
-		RCU_INIT_POINTER(bond->slave_arr, NULL);
+		RCU_INIT_POINTER(bond->usable_slaves, NULL);
 		kfree_rcu(arr, rcu);
 	}
 
diff --git a/include/net/bonding.h b/include/net/bonding.h
index dc2ce31a1f52..33bdb6d5182d 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -200,7 +200,7 @@ struct bonding {
 	struct   slave __rcu *curr_active_slave;
 	struct   slave __rcu *current_arp_slave;
 	struct   slave __rcu *primary_slave;
-	struct   bond_up_slave __rcu *slave_arr; /* Array of usable slaves */
+	struct   bond_up_slave __rcu *usable_slaves; /* Array of usable slaves */
 	bool     force_primary;
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
 	int     (*recv_probe)(const struct sk_buff *, struct bonding *,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 03/10] bonding: Add helpers to get xmit slave
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 02/10] bonding: Rename slave_arr to usable_slaves Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20 14:27   ` Jiri Pirko
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 04/10] bonding: Implement ndo_xmit_slave_get Maor Gottlieb
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

This helpers will be used by both the xmit function
and the get xmit slave ndo.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/net/bonding/bond_alb.c  | 35 ++++++++----
 drivers/net/bonding/bond_main.c | 94 +++++++++++++++++++++------------
 include/net/bond_alb.h          |  4 ++
 3 files changed, 89 insertions(+), 44 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 7bb49b049dcc..e863c694c309 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1334,11 +1334,11 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
 	return NETDEV_TX_OK;
 }
 
-netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
+struct slave *bond_xmit_tlb_slave_get(struct bonding *bond,
+				      struct sk_buff *skb)
 {
-	struct bonding *bond = netdev_priv(bond_dev);
-	struct ethhdr *eth_data;
 	struct slave *tx_slave = NULL;
+	struct ethhdr *eth_data;
 	u32 hash_index;
 
 	skb_reset_mac_header(skb);
@@ -1369,20 +1369,29 @@ netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 			break;
 		}
 	}
-	return bond_do_alb_xmit(skb, bond, tx_slave);
+	return tx_slave;
 }
 
-netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
+netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
-	struct ethhdr *eth_data;
+	struct slave *tx_slave;
+
+	tx_slave = bond_xmit_tlb_slave_get(bond, skb);
+	return bond_do_alb_xmit(skb, bond, tx_slave);
+}
+
+struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
+				      struct sk_buff *skb)
+{
 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
-	struct slave *tx_slave = NULL;
 	static const __be32 ip_bcast = htonl(0xffffffff);
-	int hash_size = 0;
+	struct slave *tx_slave = NULL;
+	const u8 *hash_start = NULL;
 	bool do_tx_balance = true;
+	struct ethhdr *eth_data;
 	u32 hash_index = 0;
-	const u8 *hash_start = NULL;
+	int hash_size = 0;
 
 	skb_reset_mac_header(skb);
 	eth_data = eth_hdr(skb);
@@ -1501,7 +1510,15 @@ netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 						       count];
 		}
 	}
+	return tx_slave;
+}
+
+netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *tx_slave = NULL;
 
+	tx_slave = bond_xmit_alb_slave_get(bond, skb);
 	return bond_do_alb_xmit(skb, bond, tx_slave);
 }
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2cb41d480ae2..7e04be86fda8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -82,6 +82,7 @@
 #include <net/bonding.h>
 #include <net/bond_3ad.h>
 #include <net/bond_alb.h>
+#include <net/lag.h>
 
 #include "bonding_priv.h"
 
@@ -3406,10 +3407,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 		(__force u32)flow_get_u32_src(&flow);
 	hash ^= (hash >> 16);
 	hash ^= (hash >> 8);
-
 	return hash >> 1;
 }
 
+static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
+						 struct sk_buff *skb,
+						 struct bond_up_slave *slaves)
+{
+	struct slave *slave;
+	unsigned int count;
+	u32 hash;
+
+	hash = bond_xmit_hash(bond, skb);
+	count = slaves ? READ_ONCE(slaves->count) : 0;
+	if (unlikely(!count))
+		return NULL;
+
+	slave = slaves->arr[hash % count];
+	return slave;
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -3923,16 +3940,15 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
 }
 
 /**
- * bond_xmit_slave_id - transmit skb through slave with slave_id
+ * bond_get_slave_by_id - get xmit slave with slave_id
  * @bond: bonding device that is transmitting
- * @skb: buffer to transmit
  * @slave_id: slave id up to slave_cnt-1 through which to transmit
  *
- * This function tries to transmit through slave with slave_id but in case
+ * This function tries to get slave with slave_id but in case
  * it fails, it tries to find the first available slave for transmission.
- * The skb is consumed in all cases, thus the function is void.
  */
-static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int slave_id)
+static struct slave *bond_get_slave_by_id(struct bonding *bond,
+					  int slave_id)
 {
 	struct list_head *iter;
 	struct slave *slave;
@@ -3941,10 +3957,8 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
 	/* Here we start from the slave with slave_id */
 	bond_for_each_slave_rcu(bond, slave, iter) {
 		if (--i < 0) {
-			if (bond_slave_can_tx(slave)) {
-				bond_dev_queue_xmit(bond, skb, slave->dev);
-				return;
-			}
+			if (bond_slave_can_tx(slave))
+				return slave;
 		}
 	}
 
@@ -3953,13 +3967,11 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
 	bond_for_each_slave_rcu(bond, slave, iter) {
 		if (--i < 0)
 			break;
-		if (bond_slave_can_tx(slave)) {
-			bond_dev_queue_xmit(bond, skb, slave->dev);
-			return;
-		}
+		if (bond_slave_can_tx(slave))
+			return slave;
 	}
-	/* no slave that can tx has been found */
-	bond_tx_drop(bond->dev, skb);
+
+	return NULL;
 }
 
 /**
@@ -3995,10 +4007,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
 	return slave_id;
 }
 
-static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
-					struct net_device *bond_dev)
+static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
+						    struct sk_buff *skb)
 {
-	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
 	int slave_cnt;
 	u32 slave_id;
@@ -4020,24 +4031,40 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 		if (iph->protocol == IPPROTO_IGMP) {
 			slave = rcu_dereference(bond->curr_active_slave);
 			if (slave)
-				bond_dev_queue_xmit(bond, skb, slave->dev);
-			else
-				bond_xmit_slave_id(bond, skb, 0);
-			return NETDEV_TX_OK;
+				return slave;
+			return bond_get_slave_by_id(bond, 0);
 		}
 	}
 
 non_igmp:
 	slave_cnt = READ_ONCE(bond->slave_cnt);
 	if (likely(slave_cnt)) {
-		slave_id = bond_rr_gen_slave_id(bond);
-		bond_xmit_slave_id(bond, skb, slave_id % slave_cnt);
-	} else {
-		bond_tx_drop(bond_dev, skb);
+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
+		return bond_get_slave_by_id(bond, slave_id);
 	}
+	return NULL;
+}
+
+static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
+					struct net_device *bond_dev)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+
+	slave = bond_xmit_roundrobin_slave_get(bond, skb);
+	if (slave)
+		bond_dev_queue_xmit(bond, skb, slave->dev);
+	else
+		bond_tx_drop(bond_dev, skb);
 	return NETDEV_TX_OK;
 }
 
+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
+						      struct sk_buff *skb)
+{
+	return rcu_dereference(bond->curr_active_slave);
+}
+
 /* In active-backup mode, we know that bond->curr_active_slave is always valid if
  * the bond has a usable interface.
  */
@@ -4047,7 +4074,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
 
-	slave = rcu_dereference(bond->curr_active_slave);
+	slave = bond_xmit_activebackup_slave_get(bond, skb);
 	if (slave)
 		bond_dev_queue_xmit(bond, skb, slave->dev);
 	else
@@ -4193,18 +4220,15 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
 				     struct net_device *dev)
 {
 	struct bonding *bond = netdev_priv(dev);
-	struct slave *slave;
 	struct bond_up_slave *slaves;
-	unsigned int count;
+	struct slave *slave;
 
 	slaves = rcu_dereference(bond->usable_slaves);
-	count = slaves ? READ_ONCE(slaves->count) : 0;
-	if (likely(count)) {
-		slave = slaves->arr[bond_xmit_hash(bond, skb) % count];
+	slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves);
+	if (likely(slave))
 		bond_dev_queue_xmit(bond, skb, slave->dev);
-	} else {
+	else
 		bond_tx_drop(dev, skb);
-	}
 
 	return NETDEV_TX_OK;
 }
diff --git a/include/net/bond_alb.h b/include/net/bond_alb.h
index b3504fcd773d..f6af76c87a6c 100644
--- a/include/net/bond_alb.h
+++ b/include/net/bond_alb.h
@@ -158,6 +158,10 @@ void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char
 void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave);
 int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
 int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
+struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
+				      struct sk_buff *skb);
+struct slave *bond_xmit_tlb_slave_get(struct bonding *bond,
+				      struct sk_buff *skb);
 void bond_alb_monitor(struct work_struct *);
 int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr);
 void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 04/10] bonding: Implement ndo_xmit_slave_get
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
                   ` (2 preceding siblings ...)
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 03/10] bonding: Add helpers to get xmit slave Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20 15:04   ` Jiri Pirko
  2020-04-20 15:36   ` David Ahern
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 05/10] RDMA/core: Add LAG functionality Maor Gottlieb
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

Add implementation of ndo_xmit_slave_get.
When user sets the LAG_FLAGS_HASH_ALL_SLAVES bit and the xmit slave
result is based on the hash, then the slave will be selected from the
array of all the slaves.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/net/bonding/bond_main.c | 123 +++++++++++++++++++++++++++-----
 include/net/bonding.h           |   1 +
 2 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7e04be86fda8..320bcb1394fd 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4137,6 +4137,40 @@ static void bond_skip_slave(struct bond_up_slave *slaves,
 	}
 }
 
+static void bond_set_slave_arr(struct bonding *bond,
+			       struct bond_up_slave *usable_slaves,
+			       struct bond_up_slave *all_slaves)
+{
+	struct bond_up_slave *usable, *all;
+
+	usable = rtnl_dereference(bond->usable_slaves);
+	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
+	if (usable)
+		kfree_rcu(usable, rcu);
+
+	all = rtnl_dereference(bond->all_slaves);
+	rcu_assign_pointer(bond->all_slaves, all_slaves);
+	if (all)
+		kfree_rcu(all, rcu);
+}
+
+static void bond_reset_slave_arr(struct bonding *bond)
+{
+	struct bond_up_slave *usable, *all;
+
+	usable = rtnl_dereference(bond->usable_slaves);
+	if (usable) {
+		RCU_INIT_POINTER(bond->usable_slaves, NULL);
+		kfree_rcu(usable, rcu);
+	}
+
+	all = rtnl_dereference(bond->all_slaves);
+	if (all) {
+		RCU_INIT_POINTER(bond->all_slaves, NULL);
+		kfree_rcu(all, rcu);
+	}
+}
+
 /* Build the usable slaves array in control path for modes that use xmit-hash
  * to determine the slave interface -
  * (a) BOND_MODE_8023AD
@@ -4147,7 +4181,7 @@ static void bond_skip_slave(struct bond_up_slave *slaves,
  */
 int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 {
-	struct bond_up_slave *usable_slaves, *old_usable_slaves;
+	struct bond_up_slave *usable_slaves = NULL, *all_slaves = NULL;
 	struct slave *slave;
 	struct list_head *iter;
 	int agg_id = 0;
@@ -4159,7 +4193,9 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 
 	usable_slaves = kzalloc(struct_size(usable_slaves, arr,
 					    bond->slave_cnt), GFP_KERNEL);
-	if (!usable_slaves) {
+	all_slaves = kzalloc(struct_size(all_slaves, arr,
+					 bond->slave_cnt), GFP_KERNEL);
+	if (!usable_slaves || !all_slaves) {
 		ret = -ENOMEM;
 		goto out;
 	}
@@ -4168,20 +4204,19 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 
 		if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
 			pr_debug("bond_3ad_get_active_agg_info failed\n");
-			kfree_rcu(usable_slaves, rcu);
 			/* No active aggragator means it's not safe to use
 			 * the previous array.
 			 */
-			old_usable_slaves = rtnl_dereference(bond->usable_slaves);
-			if (old_usable_slaves) {
-				RCU_INIT_POINTER(bond->usable_slaves, NULL);
-				kfree_rcu(old_usable_slaves, rcu);
-			}
+			bond_reset_slave_arr(bond);
 			goto out;
 		}
 		agg_id = ad_info.aggregator_id;
 	}
 	bond_for_each_slave(bond, slave, iter) {
+		if (skipslave == slave)
+			continue;
+
+		all_slaves->arr[all_slaves->count++] = slave;
 		if (BOND_MODE(bond) == BOND_MODE_8023AD) {
 			struct aggregator *agg;
 
@@ -4191,8 +4226,6 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 		}
 		if (!bond_slave_can_tx(slave))
 			continue;
-		if (skipslave == slave)
-			continue;
 
 		slave_dbg(bond->dev, slave->dev, "Adding slave to tx hash array[%d]\n",
 			  usable_slaves->count);
@@ -4200,14 +4233,17 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 		usable_slaves->arr[usable_slaves->count++] = slave;
 	}
 
-	old_usable_slaves = rtnl_dereference(bond->usable_slaves);
-	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
-	if (old_usable_slaves)
-		kfree_rcu(old_usable_slaves, rcu);
+	bond_set_slave_arr(bond, usable_slaves, all_slaves);
+	return ret;
 out:
-	if (ret != 0 && skipslave)
+	if (ret != 0 && skipslave) {
+		bond_skip_slave(rtnl_dereference(bond->all_slaves),
+				skipslave);
 		bond_skip_slave(rtnl_dereference(bond->usable_slaves),
 				skipslave);
+	}
+	kfree_rcu(all_slaves, rcu);
+	kfree_rcu(usable_slaves, rcu);
 
 	return ret;
 }
@@ -4313,6 +4349,48 @@ static u16 bond_select_queue(struct net_device *dev, struct sk_buff *skb,
 	return txq;
 }
 
+static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
+					      struct sk_buff *skb,
+					      u16 flags)
+{
+	struct bonding *bond = netdev_priv(master_dev);
+	struct bond_up_slave *slaves;
+	struct slave *slave = NULL;
+
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+		slave = bond_xmit_roundrobin_slave_get(bond, skb);
+		break;
+	case BOND_MODE_ACTIVEBACKUP:
+		slave = bond_xmit_activebackup_slave_get(bond, skb);
+		break;
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		if (flags & LAG_FLAGS_HASH_ALL_SLAVES)
+			slaves = rcu_dereference(bond->all_slaves);
+		else
+			slaves = rcu_dereference(bond->usable_slaves);
+		slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves);
+		break;
+	case BOND_MODE_BROADCAST:
+		break;
+	case BOND_MODE_ALB:
+		slave = bond_xmit_alb_slave_get(bond, skb);
+		break;
+	case BOND_MODE_TLB:
+		slave = bond_xmit_tlb_slave_get(bond, skb);
+		break;
+	default:
+		/* Should never happen, mode already checked */
+		WARN_ONCE(true, "Unknown bonding mode");
+		break;
+	}
+
+	if (slave)
+		return slave->dev;
+	return NULL;
+}
+
 static netdev_tx_t __bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct bonding *bond = netdev_priv(dev);
@@ -4434,6 +4512,7 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_del_slave		= bond_release,
 	.ndo_fix_features	= bond_fix_features,
 	.ndo_features_check	= passthru_features_check,
+	.ndo_xmit_get_slave	= bond_xmit_get_slave,
 };
 
 static const struct device_type bond_type = {
@@ -4501,9 +4580,9 @@ void bond_setup(struct net_device *bond_dev)
 static void bond_uninit(struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
+	struct bond_up_slave *usable, *all;
 	struct list_head *iter;
 	struct slave *slave;
-	struct bond_up_slave *arr;
 
 	bond_netpoll_cleanup(bond_dev);
 
@@ -4512,10 +4591,16 @@ static void bond_uninit(struct net_device *bond_dev)
 		__bond_release_one(bond_dev, slave->dev, true, true);
 	netdev_info(bond_dev, "Released all slaves\n");
 
-	arr = rtnl_dereference(bond->usable_slaves);
-	if (arr) {
+	usable = rtnl_dereference(bond->usable_slaves);
+	if (usable) {
 		RCU_INIT_POINTER(bond->usable_slaves, NULL);
-		kfree_rcu(arr, rcu);
+		kfree_rcu(usable, rcu);
+	}
+
+	all = rtnl_dereference(bond->all_slaves);
+	if (all) {
+		RCU_INIT_POINTER(bond->all_slaves, NULL);
+		kfree_rcu(all, rcu);
 	}
 
 	list_del(&bond->bond_list);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 33bdb6d5182d..a2a7f461fa63 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -201,6 +201,7 @@ struct bonding {
 	struct   slave __rcu *current_arp_slave;
 	struct   slave __rcu *primary_slave;
 	struct   bond_up_slave __rcu *usable_slaves; /* Array of usable slaves */
+	struct   bond_up_slave __rcu *all_slaves; /* Array of all slaves */
 	bool     force_primary;
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
 	int     (*recv_probe)(const struct sk_buff *, struct bonding *,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 05/10] RDMA/core: Add LAG functionality
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
                   ` (3 preceding siblings ...)
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 04/10] bonding: Implement ndo_xmit_slave_get Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 06/10] RDMA/core: Get xmit slave for LAG Maor Gottlieb
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

Add support to get the RoCE LAG xmit slave by building skb
of the RoCE packet and call to master_xmit_slave_get.
If driver wants to get the slave assume all slaves are available,
then need to set RDMA_LAG_FLAGS_HASH_ALL_SLAVES in flags.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/infiniband/core/Makefile |   2 +-
 drivers/infiniband/core/lag.c    | 139 +++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h          |   2 +
 include/rdma/lag.h               |  22 +++++
 4 files changed, 164 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/lag.c
 create mode 100644 include/rdma/lag.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index d1b14887960e..870f0fcd54d5 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -12,7 +12,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
 				nldev.o restrack.o counters.o ib_core_uverbs.o \
-				trace.o
+				trace.o lag.o
 
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
diff --git a/drivers/infiniband/core/lag.c b/drivers/infiniband/core/lag.c
new file mode 100644
index 000000000000..007c06cae3d8
--- /dev/null
+++ b/drivers/infiniband/core/lag.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2020 Mellanox Technologies. All rights reserved.
+ */
+
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_cache.h>
+#include <rdma/lag.h>
+
+static struct sk_buff *rdma_build_skb(struct ib_device *device,
+				      struct net_device *netdev,
+				      struct rdma_ah_attr *ah_attr)
+{
+	struct ipv6hdr *ip6h;
+	struct sk_buff *skb;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+	struct udphdr *uh;
+	u8 smac[ETH_ALEN];
+	bool is_ipv4;
+	int hdr_len;
+
+	is_ipv4 = ipv6_addr_v4mapped((struct in6_addr *)ah_attr->grh.dgid.raw);
+	hdr_len = ETH_HLEN + sizeof(struct udphdr) + LL_RESERVED_SPACE(netdev);
+	hdr_len += is_ipv4 ? sizeof(struct iphdr) : sizeof(struct ipv6hdr);
+
+	skb = alloc_skb(hdr_len, GFP_ATOMIC);
+	if (!skb)
+		return NULL;
+
+	skb->dev = netdev;
+	skb_reserve(skb, hdr_len);
+	skb_push(skb, sizeof(struct udphdr));
+	skb_reset_transport_header(skb);
+	uh = udp_hdr(skb);
+	uh->source = htons(0xC000);
+	uh->dest = htons(ROCE_V2_UDP_DPORT);
+	uh->len = htons(sizeof(struct udphdr));
+
+	if (is_ipv4) {
+		skb_push(skb, sizeof(struct iphdr));
+		skb_reset_network_header(skb);
+		iph = ip_hdr(skb);
+		iph->frag_off = 0;
+		iph->version = 4;
+		iph->protocol = IPPROTO_UDP;
+		iph->ihl = 0x5;
+		iph->tot_len = htons(sizeof(struct udphdr) + sizeof(struct
+								    iphdr));
+		memcpy(&iph->saddr, ah_attr->grh.sgid_attr->gid.raw + 12,
+		       sizeof(struct in_addr));
+		memcpy(&iph->daddr, ah_attr->grh.dgid.raw + 12,
+		       sizeof(struct in_addr));
+	} else {
+		skb_push(skb, sizeof(struct ipv6hdr));
+		skb_reset_network_header(skb);
+		ip6h = ipv6_hdr(skb);
+		ip6h->version = 6;
+		ip6h->nexthdr = IPPROTO_UDP;
+		memcpy(&ip6h->flow_lbl, &ah_attr->grh.flow_label,
+		       sizeof(*ip6h->flow_lbl));
+		memcpy(&ip6h->saddr, ah_attr->grh.sgid_attr->gid.raw,
+		       sizeof(struct in6_addr));
+		memcpy(&ip6h->daddr, ah_attr->grh.dgid.raw,
+		       sizeof(struct in6_addr));
+	}
+
+	skb_push(skb, sizeof(struct ethhdr));
+	skb_reset_mac_header(skb);
+	eth = eth_hdr(skb);
+	skb->protocol = eth->h_proto = htons(is_ipv4 ? ETH_P_IP : ETH_P_IPV6);
+	rdma_read_gid_l2_fields(ah_attr->grh.sgid_attr, NULL, smac);
+	memcpy(eth->h_source, smac, ETH_ALEN);
+	memcpy(eth->h_dest, ah_attr->roce.dmac, ETH_ALEN);
+
+	return skb;
+}
+
+static struct net_device *rdma_get_xmit_slave_udp(struct ib_device *device,
+						  struct net_device *master,
+						  struct rdma_ah_attr *ah_attr)
+{
+	struct net_device *slave;
+	struct sk_buff *skb;
+	u16 flags;
+
+	skb = rdma_build_skb(device, master, ah_attr);
+	if (!skb)
+		return NULL;
+
+	flags = device->lag_flags & RDMA_LAG_FLAGS_HASH_ALL_SLAVES ?
+		LAG_FLAGS_HASH_ALL_SLAVES : 0;
+	slave = master_xmit_get_slave(master, skb, flags);
+	kfree_skb(skb);
+	return slave;
+}
+
+void rdma_lag_put_ah_roce_slave(struct rdma_ah_attr *ah_attr)
+{
+	if (ah_attr->roce.xmit_slave)
+		dev_put(ah_attr->roce.xmit_slave);
+}
+
+int rdma_lag_get_ah_roce_slave(struct ib_device *device,
+			       struct rdma_ah_attr *ah_attr)
+{
+	struct net_device *master;
+	struct net_device *slave;
+
+	if (!(ah_attr->type == RDMA_AH_ATTR_TYPE_ROCE &&
+	      ah_attr->grh.sgid_attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP))
+		return 0;
+
+	rcu_read_lock();
+	master = rdma_read_gid_attr_ndev_rcu(ah_attr->grh.sgid_attr);
+	if (IS_ERR(master)) {
+		rcu_read_unlock();
+		return PTR_ERR(master);
+	}
+	dev_hold(master);
+	rcu_read_unlock();
+
+	if (!netif_is_bond_master(master)) {
+		dev_put(master);
+		return 0;
+	}
+
+	slave = rdma_get_xmit_slave_udp(device, master, ah_attr);
+
+	dev_put(master);
+	if (!slave) {
+		ibdev_warn(device, "Failed to get lag xmit slave\n");
+		return -EINVAL;
+	}
+
+	ah_attr->roce.xmit_slave = slave;
+
+	return 0;
+}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bbc5cfb57cd2..60f9969b6d83 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -894,6 +894,7 @@ struct ib_ah_attr {
 
 struct roce_ah_attr {
 	u8			dmac[ETH_ALEN];
+	struct net_device	*xmit_slave;
 };
 
 struct opa_ah_attr {
@@ -2709,6 +2710,7 @@ struct ib_device {
 	/* Used by iWarp CM */
 	char iw_ifname[IFNAMSIZ];
 	u32 iw_driver_flags;
+	u32 lag_flags;
 };
 
 struct ib_client_nl_info;
diff --git a/include/rdma/lag.h b/include/rdma/lag.h
new file mode 100644
index 000000000000..a71511824207
--- /dev/null
+++ b/include/rdma/lag.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/*
+ * Copyright (c) 2020 Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef _RDMA_LAG_H_
+#define _RDMA_LAG_H_
+
+#include <net/lag.h>
+
+struct ib_device;
+struct rdma_ah_attr;
+
+enum rdma_lag_flags {
+	RDMA_LAG_FLAGS_HASH_ALL_SLAVES = 1 << 0
+};
+
+void rdma_lag_put_ah_roce_slave(struct rdma_ah_attr *ah_attr);
+int rdma_lag_get_ah_roce_slave(struct ib_device *device,
+			       struct rdma_ah_attr *ah_attr);
+
+#endif /* _RDMA_LAG_H_ */
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 06/10] RDMA/core: Get xmit slave for LAG
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
                   ` (4 preceding siblings ...)
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 05/10] RDMA/core: Add LAG functionality Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 07/10] net/mlx5: Change lag mutex lock to spin lock Maor Gottlieb
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

Add a call to rdma_lag_get_ah_roce_slave when
Address handle is created.
Low driver can use it to select the QP's affinity port.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/infiniband/core/verbs.c | 44 ++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 56a71337112c..a0d60376ba6b 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -50,6 +50,7 @@
 #include <rdma/ib_cache.h>
 #include <rdma/ib_addr.h>
 #include <rdma/rw.h>
+#include <rdma/lag.h>
 
 #include "core_priv.h"
 #include <trace/events/rdma_core.h>
@@ -554,8 +555,14 @@ struct ib_ah *rdma_create_ah(struct ib_pd *pd, struct rdma_ah_attr *ah_attr,
 	if (ret)
 		return ERR_PTR(ret);
 
-	ah = _rdma_create_ah(pd, ah_attr, flags, NULL);
+	ret = rdma_lag_get_ah_roce_slave(pd->device, ah_attr);
+	if (ret) {
+		rdma_unfill_sgid_attr(ah_attr, old_sgid_attr);
+		return ERR_PTR(ret);
+	}
 
+	ah = _rdma_create_ah(pd, ah_attr, flags, NULL);
+	rdma_lag_put_ah_roce_slave(ah_attr);
 	rdma_unfill_sgid_attr(ah_attr, old_sgid_attr);
 	return ah;
 }
@@ -1638,6 +1645,25 @@ static int _ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *attr,
 					  &old_sgid_attr_av);
 		if (ret)
 			return ret;
+
+		if (attr->ah_attr.type == RDMA_AH_ATTR_TYPE_ROCE &&
+		    is_qp_type_connected(qp)) {
+			/*
+			 * If the user provided the qp_attr then we have to
+			 * resolve it. Kerne users have to provide already
+			 * resolved rdma_ah_attr's.
+			 */
+			if (udata) {
+				ret = ib_resolve_eth_dmac(qp->device,
+							  &attr->ah_attr);
+				if (ret)
+					goto out_av;
+			}
+			ret = rdma_lag_get_ah_roce_slave(qp->device,
+							 &attr->ah_attr);
+			if (ret)
+				goto out_av;
+		}
 	}
 	if (attr_mask & IB_QP_ALT_PATH) {
 		/*
@@ -1664,18 +1690,6 @@ static int _ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *attr,
 		}
 	}
 
-	/*
-	 * If the user provided the qp_attr then we have to resolve it. Kernel
-	 * users have to provide already resolved rdma_ah_attr's
-	 */
-	if (udata && (attr_mask & IB_QP_AV) &&
-	    attr->ah_attr.type == RDMA_AH_ATTR_TYPE_ROCE &&
-	    is_qp_type_connected(qp)) {
-		ret = ib_resolve_eth_dmac(qp->device, &attr->ah_attr);
-		if (ret)
-			goto out;
-	}
-
 	if (rdma_ib_or_roce(qp->device, port)) {
 		if (attr_mask & IB_QP_RQ_PSN && attr->rq_psn & ~0xffffff) {
 			dev_warn(&qp->device->dev,
@@ -1717,8 +1731,10 @@ static int _ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *attr,
 	if (attr_mask & IB_QP_ALT_PATH)
 		rdma_unfill_sgid_attr(&attr->alt_ah_attr, old_sgid_attr_alt_av);
 out_av:
-	if (attr_mask & IB_QP_AV)
+	if (attr_mask & IB_QP_AV) {
+		rdma_lag_put_ah_roce_slave(&attr->ah_attr);
 		rdma_unfill_sgid_attr(&attr->ah_attr, old_sgid_attr_av);
+	}
 	return ret;
 }
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 07/10] net/mlx5: Change lag mutex lock to spin lock
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
                   ` (5 preceding siblings ...)
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 06/10] RDMA/core: Get xmit slave for LAG Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 08/10] net/mlx5: Add support to get lag physical port Maor Gottlieb
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

The lag lock could be a mutex, the critical section is short
and there is no need that the thread will sleep.
Change the lock that protects the LAG structure from mutex
to spin lock. It is required for next patch that need to
access this structure from context that we can't sleep.
In addition there is no need to hold this lock when query the
congestion counters.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lag.c | 42 +++++++++----------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
index 93052b07c76c..496a3408d771 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
@@ -42,7 +42,7 @@
  * Beware of lock dependencies (preferably, no locks should be acquired
  * under it).
  */
-static DEFINE_MUTEX(lag_mutex);
+static DEFINE_SPINLOCK(lag_lock);
 
 static int mlx5_cmd_create_lag(struct mlx5_core_dev *dev, u8 remap_port1,
 			       u8 remap_port2)
@@ -297,9 +297,9 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
 	if (!dev0 || !dev1)
 		return;
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	tracker = ldev->tracker;
-	mutex_unlock(&lag_mutex);
+	spin_unlock(&lag_lock);
 
 	do_bond = tracker.is_bonded && mlx5_lag_check_prereq(ldev);
 
@@ -481,9 +481,9 @@ static int mlx5_lag_netdev_event(struct notifier_block *this,
 		break;
 	}
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	ldev->tracker = tracker;
-	mutex_unlock(&lag_mutex);
+	spin_unlock(&lag_lock);
 
 	if (changed)
 		mlx5_queue_bond_work(ldev, 0);
@@ -525,7 +525,7 @@ static void mlx5_lag_dev_add_pf(struct mlx5_lag *ldev,
 	if (fn >= MLX5_MAX_PORTS)
 		return;
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	ldev->pf[fn].dev    = dev;
 	ldev->pf[fn].netdev = netdev;
 	ldev->tracker.netdev_state[fn].link_up = 0;
@@ -533,7 +533,7 @@ static void mlx5_lag_dev_add_pf(struct mlx5_lag *ldev,
 
 	dev->priv.lag = ldev;
 
-	mutex_unlock(&lag_mutex);
+	spin_unlock(&lag_lock);
 }
 
 static void mlx5_lag_dev_remove_pf(struct mlx5_lag *ldev,
@@ -548,11 +548,11 @@ static void mlx5_lag_dev_remove_pf(struct mlx5_lag *ldev,
 	if (i == MLX5_MAX_PORTS)
 		return;
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	memset(&ldev->pf[i], 0, sizeof(*ldev->pf));
 
 	dev->priv.lag = NULL;
-	mutex_unlock(&lag_mutex);
+	spin_unlock(&lag_lock);
 }
 
 /* Must be called with intf_mutex held */
@@ -630,10 +630,10 @@ bool mlx5_lag_is_roce(struct mlx5_core_dev *dev)
 	struct mlx5_lag *ldev;
 	bool res;
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	ldev = mlx5_lag_dev_get(dev);
 	res  = ldev && __mlx5_lag_is_roce(ldev);
-	mutex_unlock(&lag_mutex);
+	spin_unlock(&lag_lock);
 
 	return res;
 }
@@ -644,10 +644,10 @@ bool mlx5_lag_is_active(struct mlx5_core_dev *dev)
 	struct mlx5_lag *ldev;
 	bool res;
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	ldev = mlx5_lag_dev_get(dev);
 	res  = ldev && __mlx5_lag_is_active(ldev);
-	mutex_unlock(&lag_mutex);
+	spin_unlock(&lag_lock);
 
 	return res;
 }
@@ -658,10 +658,10 @@ bool mlx5_lag_is_sriov(struct mlx5_core_dev *dev)
 	struct mlx5_lag *ldev;
 	bool res;
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	ldev = mlx5_lag_dev_get(dev);
 	res  = ldev && __mlx5_lag_is_sriov(ldev);
-	mutex_unlock(&lag_mutex);
+	spin_unlock(&lag_lock);
 
 	return res;
 }
@@ -687,7 +687,7 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
 	struct net_device *ndev = NULL;
 	struct mlx5_lag *ldev;
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	ldev = mlx5_lag_dev_get(dev);
 
 	if (!(ldev && __mlx5_lag_is_roce(ldev)))
@@ -704,7 +704,7 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
 		dev_hold(ndev);
 
 unlock:
-	mutex_unlock(&lag_mutex);
+	spin_unlock(&lag_lock);
 
 	return ndev;
 }
@@ -746,7 +746,7 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 
 	memset(values, 0, sizeof(*values) * num_counters);
 
-	mutex_lock(&lag_mutex);
+	spin_lock(&lag_lock);
 	ldev = mlx5_lag_dev_get(dev);
 	if (ldev && __mlx5_lag_is_roce(ldev)) {
 		num_ports = MLX5_MAX_PORTS;
@@ -756,18 +756,18 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 		num_ports = 1;
 		mdev[MLX5_LAG_P1] = dev;
 	}
+	spin_unlock(&lag_lock);
 
 	for (i = 0; i < num_ports; ++i) {
 		ret = mlx5_cmd_query_cong_counter(mdev[i], false, out, outlen);
 		if (ret)
-			goto unlock;
+			goto free;
 
 		for (j = 0; j < num_counters; ++j)
 			values[j] += be64_to_cpup((__be64 *)(out + offsets[j]));
 	}
 
-unlock:
-	mutex_unlock(&lag_mutex);
+free:
 	kvfree(out);
 	return ret;
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 08/10] net/mlx5: Add support to get lag physical port
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
                   ` (6 preceding siblings ...)
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 07/10] net/mlx5: Change lag mutex lock to spin lock Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 09/10] RDMA/mlx5: Refactor affinity related code Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 10/10] RDMA/mlx5: Set lag tx affinity according to slave Maor Gottlieb
  9 siblings, 0 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

Add function to get the device physical port of the lag slave.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lag.c | 24 +++++++++++++++++++
 include/linux/mlx5/driver.h                   |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
index 496a3408d771..5461fbe47c0d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag.c
@@ -710,6 +710,30 @@ struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev)
 }
 EXPORT_SYMBOL(mlx5_lag_get_roce_netdev);
 
+u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
+			   struct net_device *slave)
+{
+	struct mlx5_lag *ldev;
+	u8 port = 0;
+
+	spin_lock(&lag_lock);
+	ldev = mlx5_lag_dev_get(dev);
+	if (!(ldev && __mlx5_lag_is_roce(ldev)))
+		goto unlock;
+
+	if (ldev->pf[MLX5_LAG_P1].netdev == slave)
+		port = MLX5_LAG_P1;
+	else
+		port = MLX5_LAG_P2;
+
+	port = ldev->v2p_map[port];
+
+unlock:
+	spin_unlock(&lag_lock);
+	return port;
+}
+EXPORT_SYMBOL(mlx5_lag_get_slave_port);
+
 bool mlx5_lag_intf_add(struct mlx5_interface *intf, struct mlx5_priv *priv)
 {
 	struct mlx5_core_dev *dev = container_of(priv, struct mlx5_core_dev,
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 6f8f79ef829b..7b81b512d116 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1062,6 +1062,8 @@ bool mlx5_lag_is_sriov(struct mlx5_core_dev *dev);
 bool mlx5_lag_is_multipath(struct mlx5_core_dev *dev);
 bool mlx5_lag_is_active(struct mlx5_core_dev *dev);
 struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev);
+u8 mlx5_lag_get_slave_port(struct mlx5_core_dev *dev,
+			   struct net_device *slave);
 int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
 				 u64 *values,
 				 int num_counters,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 09/10] RDMA/mlx5: Refactor affinity related code
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
                   ` (7 preceding siblings ...)
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 08/10] net/mlx5: Add support to get lag physical port Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 10/10] RDMA/mlx5: Set lag tx affinity according to slave Maor Gottlieb
  9 siblings, 0 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

Move affinity related code in modify qp to function.
It's a preparation for next patch the extend the affinity
calculation to consider the xmit slave.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 90 +++++++++++++++++++--------------
 1 file changed, 53 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 1456db4b6295..a45499809903 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3416,33 +3416,61 @@ static int modify_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
 	return 0;
 }
 
-static unsigned int get_tx_affinity(struct mlx5_ib_dev *dev,
-				    struct mlx5_ib_pd *pd,
-				    struct mlx5_ib_qp_base *qp_base,
-				    u8 port_num, struct ib_udata *udata)
+static unsigned int get_tx_affinity_rr(struct mlx5_ib_dev *dev,
+				       struct ib_udata *udata)
 {
 	struct mlx5_ib_ucontext *ucontext = rdma_udata_to_drv_context(
 		udata, struct mlx5_ib_ucontext, ibucontext);
-	unsigned int tx_port_affinity;
+	u8 port_num = mlx5_core_native_port_num(dev->mdev) - 1;
+	atomic_t *tx_port_affinity;
 
-	if (ucontext) {
-		tx_port_affinity = (unsigned int)atomic_add_return(
-					   1, &ucontext->tx_port_affinity) %
-					   MLX5_MAX_PORTS +
-				   1;
+	if (ucontext)
+		tx_port_affinity = &ucontext->tx_port_affinity;
+	else
+		tx_port_affinity = &dev->port[port_num].roce.tx_port_affinity;
+
+	return (unsigned int)atomic_add_return(1, tx_port_affinity) %
+		MLX5_MAX_PORTS + 1;
+}
+
+static bool qp_supports_affinity(struct ib_qp *qp)
+{
+	struct mlx5_ib_qp *mqp = to_mqp(qp);
+
+	if ((qp->qp_type == IB_QPT_RC) ||
+	    (qp->qp_type == IB_QPT_UD &&
+	     !(mqp->flags & MLX5_IB_QP_SQPN_QP1)) ||
+	    (qp->qp_type == IB_QPT_UC) ||
+	    (qp->qp_type == IB_QPT_RAW_PACKET) ||
+	    (qp->qp_type == IB_QPT_XRC_INI) ||
+	    (qp->qp_type == IB_QPT_XRC_TGT))
+		return true;
+	return false;
+}
+
+static unsigned int get_tx_affinity(struct ib_qp *qp, u8 init,
+				    struct ib_udata *udata)
+{
+	struct mlx5_ib_ucontext *ucontext = rdma_udata_to_drv_context(
+		udata, struct mlx5_ib_ucontext, ibucontext);
+	struct mlx5_ib_dev *dev = to_mdev(qp->device);
+	struct mlx5_ib_qp *mqp = to_mqp(qp);
+	struct mlx5_ib_qp_base *qp_base;
+	unsigned int tx_affinity;
+
+	if (!(dev->lag_active && init && qp_supports_affinity(qp)))
+		return 0;
+
+	tx_affinity = get_tx_affinity_rr(dev, udata);
+
+	qp_base = &mqp->trans_qp.base;
+	if (ucontext)
 		mlx5_ib_dbg(dev, "Set tx affinity 0x%x to qpn 0x%x ucontext %p\n",
-				tx_port_affinity, qp_base->mqp.qpn, ucontext);
-	} else {
-		tx_port_affinity =
-			(unsigned int)atomic_add_return(
-				1, &dev->port[port_num].roce.tx_port_affinity) %
-				MLX5_MAX_PORTS +
-			1;
+			    tx_affinity, qp_base->mqp.qpn, ucontext);
+	else
 		mlx5_ib_dbg(dev, "Set tx affinity 0x%x to qpn 0x%x\n",
-				tx_port_affinity, qp_base->mqp.qpn);
-	}
-
-	return tx_port_affinity;
+			    tx_affinity, qp_base->mqp.qpn);
+	return tx_affinity;
 }
 
 static int __mlx5_ib_qp_set_counter(struct ib_qp *qp,
@@ -3554,22 +3582,10 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 		}
 	}
 
-	if ((cur_state == IB_QPS_RESET) && (new_state == IB_QPS_INIT)) {
-		if ((ibqp->qp_type == IB_QPT_RC) ||
-		    (ibqp->qp_type == IB_QPT_UD &&
-		     !(qp->flags & MLX5_IB_QP_SQPN_QP1)) ||
-		    (ibqp->qp_type == IB_QPT_UC) ||
-		    (ibqp->qp_type == IB_QPT_RAW_PACKET) ||
-		    (ibqp->qp_type == IB_QPT_XRC_INI) ||
-		    (ibqp->qp_type == IB_QPT_XRC_TGT)) {
-			if (dev->lag_active) {
-				u8 p = mlx5_core_native_port_num(dev->mdev) - 1;
-				tx_affinity = get_tx_affinity(dev, pd, base, p,
-							      udata);
-				context->flags |= cpu_to_be32(tx_affinity << 24);
-			}
-		}
-	}
+	tx_affinity = get_tx_affinity(ibqp,
+				      cur_state == IB_QPS_RESET &&
+				      new_state == IB_QPS_INIT, udata);
+	context->flags |= cpu_to_be32(tx_affinity << 24);
 
 	if (is_sqp(ibqp->qp_type)) {
 		context->mtu_msgmax = (IB_MTU_256 << 5) | 8;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V2 mlx5-next 10/10] RDMA/mlx5: Set lag tx affinity according to slave
  2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
                   ` (8 preceding siblings ...)
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 09/10] RDMA/mlx5: Refactor affinity related code Maor Gottlieb
@ 2020-04-20  7:54 ` Maor Gottlieb
  9 siblings, 0 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20  7:54 UTC (permalink / raw)
  To: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr, Maor Gottlieb

The patch sets the lag tx affinity of the data QPs and the
GSI QPs according to the LAG xmit slave.

For GSI QPs, in case that the link layer is Ethenet (RoCE) we create
two GSI QPs, one for each physical port. When the driver selects the
GSI QP, it will consider the port affinity result.
For connected QPs, the driver sets the affinity of the xmit slave.

The above, ensure that RC QP and it's corresponding GSI QP will
transmit from the same physical port.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
---
 drivers/infiniband/hw/mlx5/ah.c      |  4 +++
 drivers/infiniband/hw/mlx5/gsi.c     | 34 ++++++++++++++----
 drivers/infiniband/hw/mlx5/main.c    |  2 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 drivers/infiniband/hw/mlx5/qp.c      | 53 +++++++++++++++++++---------
 include/linux/mlx5/mlx5_ifc.h        |  4 ++-
 include/linux/mlx5/qp.h              |  2 ++
 7 files changed, 75 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
index 80642dd359bc..14ad05e7c5bf 100644
--- a/drivers/infiniband/hw/mlx5/ah.c
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -51,6 +51,10 @@ static void create_ib_ah(struct mlx5_ib_dev *dev, struct mlx5_ib_ah *ah,
 	ah->av.stat_rate_sl = (rdma_ah_get_static_rate(ah_attr) << 4);
 
 	if (ah_attr->type == RDMA_AH_ATTR_TYPE_ROCE) {
+		if (ah_attr->roce.xmit_slave)
+			ah->xmit_port =
+				mlx5_lag_get_slave_port(dev->mdev,
+							ah_attr->roce.xmit_slave);
 		gid_type = ah_attr->grh.sgid_attr->gid_type;
 
 		memcpy(ah->av.rmac, ah_attr->roce.dmac,
diff --git a/drivers/infiniband/hw/mlx5/gsi.c b/drivers/infiniband/hw/mlx5/gsi.c
index 1ae6fd95acaa..fbae1c094fe2 100644
--- a/drivers/infiniband/hw/mlx5/gsi.c
+++ b/drivers/infiniband/hw/mlx5/gsi.c
@@ -119,12 +119,20 @@ struct ib_qp *mlx5_ib_gsi_create_qp(struct ib_pd *pd,
 	struct mlx5_ib_gsi_qp *gsi;
 	struct ib_qp_init_attr hw_init_attr = *init_attr;
 	const u8 port_num = init_attr->port_num;
-	const int num_pkeys = pd->device->attrs.max_pkeys;
-	const int num_qps = mlx5_ib_deth_sqpn_cap(dev) ? num_pkeys : 0;
+	int num_qps = 0;
 	int ret;
 
 	mlx5_ib_dbg(dev, "creating GSI QP\n");
 
+	if (mlx5_ib_deth_sqpn_cap(dev)) {
+		if (MLX5_CAP_GEN(dev->mdev,
+				 port_type) == MLX5_CAP_PORT_TYPE_IB)
+			num_qps = pd->device->attrs.max_pkeys;
+		else if (dev->lag_active)
+			num_qps = MLX5_MAX_PORTS;
+	}
+
+
 	if (port_num > ARRAY_SIZE(dev->devr.ports) || port_num < 1) {
 		mlx5_ib_warn(dev,
 			     "invalid port number %d during GSI QP creation\n",
@@ -270,7 +278,7 @@ static struct ib_qp *create_gsi_ud_qp(struct mlx5_ib_gsi_qp *gsi)
 }
 
 static int modify_to_rts(struct mlx5_ib_gsi_qp *gsi, struct ib_qp *qp,
-			 u16 qp_index)
+			 u16 pkey_index)
 {
 	struct mlx5_ib_dev *dev = to_mdev(qp->device);
 	struct ib_qp_attr attr;
@@ -279,7 +287,7 @@ static int modify_to_rts(struct mlx5_ib_gsi_qp *gsi, struct ib_qp *qp,
 
 	mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_QKEY | IB_QP_PORT;
 	attr.qp_state = IB_QPS_INIT;
-	attr.pkey_index = qp_index;
+	attr.pkey_index = pkey_index;
 	attr.qkey = IB_QP1_QKEY;
 	attr.port_num = gsi->port_num;
 	ret = ib_modify_qp(qp, &attr, mask);
@@ -313,12 +321,17 @@ static void setup_qp(struct mlx5_ib_gsi_qp *gsi, u16 qp_index)
 {
 	struct ib_device *device = gsi->rx_qp->device;
 	struct mlx5_ib_dev *dev = to_mdev(device);
+	int pkey_index = qp_index;
+	struct mlx5_ib_qp *mqp;
 	struct ib_qp *qp;
 	unsigned long flags;
 	u16 pkey;
 	int ret;
 
-	ret = ib_query_pkey(device, gsi->port_num, qp_index, &pkey);
+	if (MLX5_CAP_GEN(dev->mdev,  port_type) != MLX5_CAP_PORT_TYPE_IB)
+		pkey_index = 0;
+
+	ret = ib_query_pkey(device, gsi->port_num, pkey_index, &pkey);
 	if (ret) {
 		mlx5_ib_warn(dev, "unable to read P_Key at port %d, index %d\n",
 			     gsi->port_num, qp_index);
@@ -347,7 +360,10 @@ static void setup_qp(struct mlx5_ib_gsi_qp *gsi, u16 qp_index)
 		return;
 	}
 
-	ret = modify_to_rts(gsi, qp, qp_index);
+	mqp = to_mqp(qp);
+	if (dev->lag_active)
+		mqp->gsi_lag_port = qp_index + 1;
+	ret = modify_to_rts(gsi, qp, pkey_index);
 	if (ret)
 		goto err_destroy_qp;
 
@@ -466,11 +482,15 @@ static int mlx5_ib_gsi_silent_drop(struct mlx5_ib_gsi_qp *gsi,
 static struct ib_qp *get_tx_qp(struct mlx5_ib_gsi_qp *gsi, struct ib_ud_wr *wr)
 {
 	struct mlx5_ib_dev *dev = to_mdev(gsi->rx_qp->device);
+	struct mlx5_ib_ah *ah = to_mah(wr->ah);
 	int qp_index = wr->pkey_index;
 
-	if (!mlx5_ib_deth_sqpn_cap(dev))
+	if (!gsi->num_qps)
 		return gsi->rx_qp;
 
+	if (dev->lag_active && ah->xmit_port)
+		qp_index = ah->xmit_port - 1;
+
 	if (qp_index >= gsi->num_qps)
 		return NULL;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 6679756506e6..2db2309dde47 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -53,6 +53,7 @@
 #include <linux/list.h>
 #include <rdma/ib_smi.h>
 #include <rdma/ib_umem.h>
+#include <rdma/lag.h>
 #include <linux/in.h>
 #include <linux/etherdevice.h>
 #include "mlx5_ib.h"
@@ -6549,6 +6550,7 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	dev->ib_dev.phys_port_cnt	= dev->num_ports;
 	dev->ib_dev.num_comp_vectors    = mlx5_comp_vectors_count(mdev);
 	dev->ib_dev.dev.parent		= mdev->device;
+	dev->ib_dev.lag_flags		= RDMA_LAG_FLAGS_HASH_ALL_SLAVES;
 
 	mutex_init(&dev->cap_mask_mutex);
 	INIT_LIST_HEAD(&dev->qp_list);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index a4e522385de0..a7b5581a7a4d 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -471,6 +471,7 @@ struct mlx5_ib_qp {
 	 * but not take effective
 	 */
 	u32                     counter_pending;
+	u16			gsi_lag_port;
 };
 
 struct mlx5_ib_cq_buf {
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index a45499809903..9e9ad69152f7 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3052,10 +3052,12 @@ static enum mlx5_qp_optpar opt_mask[MLX5_QP_NUM_STATE][MLX5_QP_NUM_STATE][MLX5_Q
 					  MLX5_QP_OPTPAR_RAE		|
 					  MLX5_QP_OPTPAR_RWE		|
 					  MLX5_QP_OPTPAR_PKEY_INDEX	|
-					  MLX5_QP_OPTPAR_PRI_PORT,
+					  MLX5_QP_OPTPAR_PRI_PORT	|
+					  MLX5_QP_OPTPAR_LAG_TX_AFF,
 			[MLX5_QP_ST_UC] = MLX5_QP_OPTPAR_RWE		|
 					  MLX5_QP_OPTPAR_PKEY_INDEX	|
-					  MLX5_QP_OPTPAR_PRI_PORT,
+					  MLX5_QP_OPTPAR_PRI_PORT	|
+					  MLX5_QP_OPTPAR_LAG_TX_AFF,
 			[MLX5_QP_ST_UD] = MLX5_QP_OPTPAR_PKEY_INDEX	|
 					  MLX5_QP_OPTPAR_Q_KEY		|
 					  MLX5_QP_OPTPAR_PRI_PORT,
@@ -3063,17 +3065,20 @@ static enum mlx5_qp_optpar opt_mask[MLX5_QP_NUM_STATE][MLX5_QP_NUM_STATE][MLX5_Q
 					  MLX5_QP_OPTPAR_RAE		|
 					  MLX5_QP_OPTPAR_RWE		|
 					  MLX5_QP_OPTPAR_PKEY_INDEX	|
-					  MLX5_QP_OPTPAR_PRI_PORT,
+					  MLX5_QP_OPTPAR_PRI_PORT	|
+					  MLX5_QP_OPTPAR_LAG_TX_AFF,
 		},
 		[MLX5_QP_STATE_RTR] = {
 			[MLX5_QP_ST_RC] = MLX5_QP_OPTPAR_ALT_ADDR_PATH  |
 					  MLX5_QP_OPTPAR_RRE            |
 					  MLX5_QP_OPTPAR_RAE            |
 					  MLX5_QP_OPTPAR_RWE            |
-					  MLX5_QP_OPTPAR_PKEY_INDEX,
+					  MLX5_QP_OPTPAR_PKEY_INDEX	|
+					  MLX5_QP_OPTPAR_LAG_TX_AFF,
 			[MLX5_QP_ST_UC] = MLX5_QP_OPTPAR_ALT_ADDR_PATH  |
 					  MLX5_QP_OPTPAR_RWE            |
-					  MLX5_QP_OPTPAR_PKEY_INDEX,
+					  MLX5_QP_OPTPAR_PKEY_INDEX	|
+					  MLX5_QP_OPTPAR_LAG_TX_AFF,
 			[MLX5_QP_ST_UD] = MLX5_QP_OPTPAR_PKEY_INDEX     |
 					  MLX5_QP_OPTPAR_Q_KEY,
 			[MLX5_QP_ST_MLX] = MLX5_QP_OPTPAR_PKEY_INDEX	|
@@ -3082,7 +3087,8 @@ static enum mlx5_qp_optpar opt_mask[MLX5_QP_NUM_STATE][MLX5_QP_NUM_STATE][MLX5_Q
 					  MLX5_QP_OPTPAR_RRE            |
 					  MLX5_QP_OPTPAR_RAE            |
 					  MLX5_QP_OPTPAR_RWE            |
-					  MLX5_QP_OPTPAR_PKEY_INDEX,
+					  MLX5_QP_OPTPAR_PKEY_INDEX	|
+					  MLX5_QP_OPTPAR_LAG_TX_AFF,
 		},
 	},
 	[MLX5_QP_STATE_RTR] = {
@@ -3435,11 +3441,8 @@ static unsigned int get_tx_affinity_rr(struct mlx5_ib_dev *dev,
 
 static bool qp_supports_affinity(struct ib_qp *qp)
 {
-	struct mlx5_ib_qp *mqp = to_mqp(qp);
-
 	if ((qp->qp_type == IB_QPT_RC) ||
-	    (qp->qp_type == IB_QPT_UD &&
-	     !(mqp->flags & MLX5_IB_QP_SQPN_QP1)) ||
+	    (qp->qp_type == IB_QPT_UD) ||
 	    (qp->qp_type == IB_QPT_UC) ||
 	    (qp->qp_type == IB_QPT_RAW_PACKET) ||
 	    (qp->qp_type == IB_QPT_XRC_INI) ||
@@ -3448,7 +3451,9 @@ static bool qp_supports_affinity(struct ib_qp *qp)
 	return false;
 }
 
-static unsigned int get_tx_affinity(struct ib_qp *qp, u8 init,
+static unsigned int get_tx_affinity(struct ib_qp *qp,
+				    const struct ib_qp_attr *attr,
+				    int attr_mask, u8 init,
 				    struct ib_udata *udata)
 {
 	struct mlx5_ib_ucontext *ucontext = rdma_udata_to_drv_context(
@@ -3458,10 +3463,19 @@ static unsigned int get_tx_affinity(struct ib_qp *qp, u8 init,
 	struct mlx5_ib_qp_base *qp_base;
 	unsigned int tx_affinity;
 
-	if (!(dev->lag_active && init && qp_supports_affinity(qp)))
+	if (!(dev->lag_active && qp_supports_affinity(qp)))
 		return 0;
 
-	tx_affinity = get_tx_affinity_rr(dev, udata);
+	if (mqp->flags & MLX5_IB_QP_SQPN_QP1)
+		tx_affinity = mqp->gsi_lag_port;
+	else if (init)
+		tx_affinity = get_tx_affinity_rr(dev, udata);
+	else if ((attr_mask & IB_QP_AV) && attr->ah_attr.roce.xmit_slave)
+		tx_affinity =
+			mlx5_lag_get_slave_port(dev->mdev,
+						attr->ah_attr.roce.xmit_slave);
+	else
+		return 0;
 
 	qp_base = &mqp->trans_qp.base;
 	if (ucontext)
@@ -3547,7 +3561,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 	struct mlx5_qp_context *context;
 	struct mlx5_ib_pd *pd;
 	enum mlx5_qp_state mlx5_cur, mlx5_new;
-	enum mlx5_qp_optpar optpar;
+	enum mlx5_qp_optpar optpar = 0;
 	u32 set_id = 0;
 	int mlx5_st;
 	int err;
@@ -3582,10 +3596,15 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 		}
 	}
 
-	tx_affinity = get_tx_affinity(ibqp,
+	tx_affinity = get_tx_affinity(ibqp, attr, attr_mask,
 				      cur_state == IB_QPS_RESET &&
 				      new_state == IB_QPS_INIT, udata);
-	context->flags |= cpu_to_be32(tx_affinity << 24);
+	if (tx_affinity) {
+		context->flags |= cpu_to_be32(tx_affinity << 24);
+		if (new_state == IB_QPS_RTR &&
+		    MLX5_CAP_GEN(dev->mdev, init2_lag_tx_port_affinity))
+			optpar |= MLX5_QP_OPTPAR_LAG_TX_AFF;
+	}
 
 	if (is_sqp(ibqp->qp_type)) {
 		context->mtu_msgmax = (IB_MTU_256 << 5) | 8;
@@ -3722,7 +3741,7 @@ static int __mlx5_ib_modify_qp(struct ib_qp *ibqp,
 	}
 
 	op = optab[mlx5_cur][mlx5_new];
-	optpar = ib_mask_to_mlx5_opt(attr_mask);
+	optpar |= ib_mask_to_mlx5_opt(attr_mask);
 	optpar &= opt_mask[mlx5_cur][mlx5_new][mlx5_st];
 
 	if (qp->ibqp.qp_type == IB_QPT_RAW_PACKET ||
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 69b27c7dfc3e..a3b6c92e889e 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1296,7 +1296,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         wol_p[0x1];
 
 	u8         stat_rate_support[0x10];
-	u8         reserved_at_1f0[0xc];
+	u8         reserved_at_1f0[0x8];
+	u8         init2_lag_tx_port_affinity[0x1];
+	u8         reserved_at_1f9[0x3];
 	u8         cqe_version[0x4];
 
 	u8         compact_address_vector[0x1];
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index ae63b1ae9004..fab88b0c76f9 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -66,6 +66,7 @@ enum mlx5_qp_optpar {
 	MLX5_QP_OPTPAR_RETRY_COUNT		= 1 << 12,
 	MLX5_QP_OPTPAR_RNR_RETRY		= 1 << 13,
 	MLX5_QP_OPTPAR_ACK_TIMEOUT		= 1 << 14,
+	MLX5_QP_OPTPAR_LAG_TX_AFF		= 1 << 15,
 	MLX5_QP_OPTPAR_PRI_PORT			= 1 << 16,
 	MLX5_QP_OPTPAR_SRQN			= 1 << 18,
 	MLX5_QP_OPTPAR_CQN_RCV			= 1 << 19,
@@ -315,6 +316,7 @@ struct mlx5_av {
 struct mlx5_ib_ah {
 	struct ib_ah		ibah;
 	struct mlx5_av		av;
+	u8			xmit_port;
 };
 
 static inline struct mlx5_ib_ah *to_mah(struct ib_ah *ibah)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get Maor Gottlieb
@ 2020-04-20 14:01   ` Jiri Pirko
  2020-04-20 17:29     ` David Ahern
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 14:01 UTC (permalink / raw)
  To: Maor Gottlieb
  Cc: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 09:54:17AM CEST, maorg@mellanox.com wrote:
>Add new ndo to get the xmit slave of master device.
>User should release the slave when it's not longer needed.
>When slave selection method is based on hash, then the user can ask to
>get the xmit slave assume all the slaves can transmit by setting the
>LAG_FLAGS_HASH_ALL_SLAVES bit in the flags argument.
>
>Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>---
> include/linux/netdevice.h |  3 +++
> include/net/lag.h         | 32 ++++++++++++++++++++++++++++++++
> 2 files changed, 35 insertions(+)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 130a668049ab..e8852f3ad0b6 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1389,6 +1389,9 @@ struct net_device_ops {
> 						 struct netlink_ext_ack *extack);
> 	int			(*ndo_del_slave)(struct net_device *dev,
> 						 struct net_device *slave_dev);
>+	struct net_device*	(*ndo_xmit_get_slave)(struct net_device *master_dev,
>+						      struct sk_buff *skb,
>+						      u16 flags);

Please adjust the name to:
ndo_get_lag_xmit_slave



> 	netdev_features_t	(*ndo_fix_features)(struct net_device *dev,
> 						    netdev_features_t features);
> 	int			(*ndo_set_features)(struct net_device *dev,
>diff --git a/include/net/lag.h b/include/net/lag.h
>index 95b880e6fdde..c43b035989c4 100644
>--- a/include/net/lag.h
>+++ b/include/net/lag.h
>@@ -6,6 +6,38 @@
> #include <linux/if_team.h>
> #include <net/bonding.h>
> 
>+enum lag_get_slaves_flags {
>+	LAG_FLAGS_HASH_ALL_SLAVES = 1<<0

Enum name and the values should be in sync. Also sync with the ndo name.

Why exactly do you need these flags? Do you anticipate more of them?
A simple bool arg to the ndo would do I believe. Can be changed later if
needed.



>+};
>+
>+/**
>+ * master_xmit_slave_get - Get the xmit slave of master device
>+ * @skb: The packet
>+ * @flags: lag_get_slaves_flags
>+ *
>+ * This can be called from any context and does its own locking.
>+ * The returned handle has the usage count incremented and the caller must
>+ * use dev_put() to release it when it is no longer needed.
>+ * %NULL is returned if no slave is found.
>+ */
>+
>+static inline
>+struct net_device *master_xmit_get_slave(struct net_device *master_dev,

Please honor the namespace:
net_lag_get_xmit_slave

Also, just "struct net_device *dev" would be enough.



>+					 struct sk_buff *skb,
>+					 u16 flags)
>+{
>+	const struct net_device_ops *ops = master_dev->netdev_ops;
>+	struct net_device *slave = NULL;

"slave_dev" please.

Just check the ndo here and return NULL if it is not defined. That way
you avoid unnecessary NULL initialization and rcu lock.


>+
>+	rcu_read_lock();
>+	if (ops->ndo_xmit_get_slave)
>+		slave = ops->ndo_xmit_get_slave(master_dev, skb, flags);
>+	if (slave)
>+		dev_hold(slave);
>+	rcu_read_unlock();
>+	return slave;
>+}
>+
> static inline bool net_lag_port_dev_txable(const struct net_device *port_dev)
> {
> 	if (netif_is_team_port(port_dev))
>-- 
>2.17.2
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 02/10] bonding: Rename slave_arr to usable_slaves
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 02/10] bonding: Rename slave_arr to usable_slaves Maor Gottlieb
@ 2020-04-20 14:17   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 14:17 UTC (permalink / raw)
  To: Maor Gottlieb
  Cc: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 09:54:18AM CEST, maorg@mellanox.com wrote:
>This patch renames slave_arr to usable_slaves, since we will
>have two arrays, one for the usable slaves and the other to all
>slaves. In addition, exports the bond_skip_slave logic to function.

I the patch description, you should tell the codebase what to do. You
should not talk about "this patch".

>
>Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>---
> drivers/net/bonding/bond_alb.c  |  4 +-
> drivers/net/bonding/bond_main.c | 85 +++++++++++++++++----------------
> include/net/bonding.h           |  2 +-
> 3 files changed, 48 insertions(+), 43 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index c81698550e5a..7bb49b049dcc 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1360,7 +1360,7 @@ netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 				struct bond_up_slave *slaves;
> 				unsigned int count;
> 
>-				slaves = rcu_dereference(bond->slave_arr);
>+				slaves = rcu_dereference(bond->usable_slaves);
> 				count = slaves ? READ_ONCE(slaves->count) : 0;
> 				if (likely(count))
> 					tx_slave = slaves->arr[hash_index %
>@@ -1494,7 +1494,7 @@ netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 			struct bond_up_slave *slaves;
> 			unsigned int count;
> 
>-			slaves = rcu_dereference(bond->slave_arr);
>+			slaves = rcu_dereference(bond->usable_slaves);
> 			count = slaves ? READ_ONCE(slaves->count) : 0;
> 			if (likely(count))
> 				tx_slave = slaves->arr[bond_xmit_hash(bond, skb) %
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 2e70e43c5df5..2cb41d480ae2 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4087,6 +4087,29 @@ static void bond_slave_arr_handler(struct work_struct *work)
> 	bond_slave_arr_work_rearm(bond, 1);
> }
> 
>+static void bond_skip_slave(struct bond_up_slave *slaves,
>+			    struct slave *skipslave)
>+{
>+	int idx;
>+
>+	/* Rare situation where caller has asked to skip a specific
>+	 * slave but allocation failed (most likely!). BTW this is
>+	 * only possible when the call is initiated from
>+	 * __bond_release_one(). In this situation; overwrite the
>+	 * skipslave entry in the array with the last entry from the
>+	 * array to avoid a situation where the xmit path may choose
>+	 * this to-be-skipped slave to send a packet out.
>+	 */
>+	for (idx = 0; slaves && idx < slaves->count; idx++) {
>+		if (skipslave == slaves->arr[idx]) {
>+			slaves->arr[idx] =
>+				slaves->arr[slaves->count - 1];
>+			slaves->count--;
>+			break;
>+		}
>+	}

Do this move in a separate patch. Is not related to the rename.


>+}
>+
> /* Build the usable slaves array in control path for modes that use xmit-hash
>  * to determine the slave interface -
>  * (a) BOND_MODE_8023AD
>@@ -4097,9 +4120,9 @@ static void bond_slave_arr_handler(struct work_struct *work)
>  */
> int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> {
>+	struct bond_up_slave *usable_slaves, *old_usable_slaves;
> 	struct slave *slave;
> 	struct list_head *iter;
>-	struct bond_up_slave *new_arr, *old_arr;
> 	int agg_id = 0;
> 	int ret = 0;
> 
>@@ -4107,11 +4130,10 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> 	WARN_ON(lockdep_is_held(&bond->mode_lock));
> #endif
> 
>-	new_arr = kzalloc(offsetof(struct bond_up_slave, arr[bond->slave_cnt]),
>-			  GFP_KERNEL);
>-	if (!new_arr) {
>+	usable_slaves = kzalloc(struct_size(usable_slaves, arr,
>+					    bond->slave_cnt), GFP_KERNEL);
>+	if (!usable_slaves) {
> 		ret = -ENOMEM;
>-		pr_err("Failed to build slave-array.\n");
> 		goto out;
> 	}
> 	if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>@@ -4119,14 +4141,14 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> 
> 		if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
> 			pr_debug("bond_3ad_get_active_agg_info failed\n");
>-			kfree_rcu(new_arr, rcu);
>+			kfree_rcu(usable_slaves, rcu);
> 			/* No active aggragator means it's not safe to use
> 			 * the previous array.
> 			 */
>-			old_arr = rtnl_dereference(bond->slave_arr);
>-			if (old_arr) {
>-				RCU_INIT_POINTER(bond->slave_arr, NULL);
>-				kfree_rcu(old_arr, rcu);
>+			old_usable_slaves = rtnl_dereference(bond->usable_slaves);
>+			if (old_usable_slaves) {
>+				RCU_INIT_POINTER(bond->usable_slaves, NULL);
>+				kfree_rcu(old_usable_slaves, rcu);
> 			}
> 			goto out;
> 		}
>@@ -4146,37 +4168,20 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> 			continue;
> 
> 		slave_dbg(bond->dev, slave->dev, "Adding slave to tx hash array[%d]\n",
>-			  new_arr->count);
>+			  usable_slaves->count);
> 
>-		new_arr->arr[new_arr->count++] = slave;
>+		usable_slaves->arr[usable_slaves->count++] = slave;
> 	}
> 
>-	old_arr = rtnl_dereference(bond->slave_arr);
>-	rcu_assign_pointer(bond->slave_arr, new_arr);
>-	if (old_arr)
>-		kfree_rcu(old_arr, rcu);
>+	old_usable_slaves = rtnl_dereference(bond->usable_slaves);
>+	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
>+	if (old_usable_slaves)
>+		kfree_rcu(old_usable_slaves, rcu);
> out:
>-	if (ret != 0 && skipslave) {
>-		int idx;
>-
>-		/* Rare situation where caller has asked to skip a specific
>-		 * slave but allocation failed (most likely!). BTW this is
>-		 * only possible when the call is initiated from
>-		 * __bond_release_one(). In this situation; overwrite the
>-		 * skipslave entry in the array with the last entry from the
>-		 * array to avoid a situation where the xmit path may choose
>-		 * this to-be-skipped slave to send a packet out.
>-		 */
>-		old_arr = rtnl_dereference(bond->slave_arr);
>-		for (idx = 0; old_arr != NULL && idx < old_arr->count; idx++) {
>-			if (skipslave == old_arr->arr[idx]) {
>-				old_arr->arr[idx] =
>-				    old_arr->arr[old_arr->count-1];
>-				old_arr->count--;
>-				break;
>-			}
>-		}
>-	}
>+	if (ret != 0 && skipslave)
>+		bond_skip_slave(rtnl_dereference(bond->usable_slaves),
>+				skipslave);
>+
> 	return ret;
> }
> 
>@@ -4192,7 +4197,7 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
> 	struct bond_up_slave *slaves;
> 	unsigned int count;
> 
>-	slaves = rcu_dereference(bond->slave_arr);
>+	slaves = rcu_dereference(bond->usable_slaves);
> 	count = slaves ? READ_ONCE(slaves->count) : 0;
> 	if (likely(count)) {
> 		slave = slaves->arr[bond_xmit_hash(bond, skb) % count];
>@@ -4483,9 +4488,9 @@ static void bond_uninit(struct net_device *bond_dev)
> 		__bond_release_one(bond_dev, slave->dev, true, true);
> 	netdev_info(bond_dev, "Released all slaves\n");
> 
>-	arr = rtnl_dereference(bond->slave_arr);
>+	arr = rtnl_dereference(bond->usable_slaves);
> 	if (arr) {
>-		RCU_INIT_POINTER(bond->slave_arr, NULL);
>+		RCU_INIT_POINTER(bond->usable_slaves, NULL);
> 		kfree_rcu(arr, rcu);
> 	}
> 
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index dc2ce31a1f52..33bdb6d5182d 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -200,7 +200,7 @@ struct bonding {
> 	struct   slave __rcu *curr_active_slave;
> 	struct   slave __rcu *current_arp_slave;
> 	struct   slave __rcu *primary_slave;
>-	struct   bond_up_slave __rcu *slave_arr; /* Array of usable slaves */
>+	struct   bond_up_slave __rcu *usable_slaves; /* Array of usable slaves */
> 	bool     force_primary;
> 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
> 	int     (*recv_probe)(const struct sk_buff *, struct bonding *,
>-- 
>2.17.2
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 03/10] bonding: Add helpers to get xmit slave
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 03/10] bonding: Add helpers to get xmit slave Maor Gottlieb
@ 2020-04-20 14:27   ` Jiri Pirko
  2020-04-20 18:46     ` Jay Vosburgh
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 14:27 UTC (permalink / raw)
  To: Maor Gottlieb
  Cc: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 09:54:19AM CEST, maorg@mellanox.com wrote:
>This helpers will be used by both the xmit function
>and the get xmit slave ndo.

Be more verbose about what you are doing please. From this I have no
clue what is going on.


>
>Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>---
> drivers/net/bonding/bond_alb.c  | 35 ++++++++----
> drivers/net/bonding/bond_main.c | 94 +++++++++++++++++++++------------
> include/net/bond_alb.h          |  4 ++
> 3 files changed, 89 insertions(+), 44 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 7bb49b049dcc..e863c694c309 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1334,11 +1334,11 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
> 	return NETDEV_TX_OK;
> }
> 
>-netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>+struct slave *bond_xmit_tlb_slave_get(struct bonding *bond,
>+				      struct sk_buff *skb)
> {
>-	struct bonding *bond = netdev_priv(bond_dev);
>-	struct ethhdr *eth_data;
> 	struct slave *tx_slave = NULL;
>+	struct ethhdr *eth_data;
> 	u32 hash_index;
> 
> 	skb_reset_mac_header(skb);
>@@ -1369,20 +1369,29 @@ netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 			break;
> 		}
> 	}
>-	return bond_do_alb_xmit(skb, bond, tx_slave);
>+	return tx_slave;
> }
> 
>-netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>+netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
>-	struct ethhdr *eth_data;
>+	struct slave *tx_slave;
>+
>+	tx_slave = bond_xmit_tlb_slave_get(bond, skb);
>+	return bond_do_alb_xmit(skb, bond, tx_slave);
>+}
>+
>+struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
>+				      struct sk_buff *skb)
>+{
> 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>-	struct slave *tx_slave = NULL;
> 	static const __be32 ip_bcast = htonl(0xffffffff);
>-	int hash_size = 0;
>+	struct slave *tx_slave = NULL;
>+	const u8 *hash_start = NULL;
> 	bool do_tx_balance = true;
>+	struct ethhdr *eth_data;
> 	u32 hash_index = 0;
>-	const u8 *hash_start = NULL;
>+	int hash_size = 0;
> 
> 	skb_reset_mac_header(skb);
> 	eth_data = eth_hdr(skb);
>@@ -1501,7 +1510,15 @@ netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 						       count];
> 		}
> 	}
>+	return tx_slave;
>+}
>+
>+netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>+{
>+	struct bonding *bond = netdev_priv(bond_dev);
>+	struct slave *tx_slave = NULL;
> 
>+	tx_slave = bond_xmit_alb_slave_get(bond, skb);
> 	return bond_do_alb_xmit(skb, bond, tx_slave);
> }
> 
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 2cb41d480ae2..7e04be86fda8 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -82,6 +82,7 @@
> #include <net/bonding.h>
> #include <net/bond_3ad.h>
> #include <net/bond_alb.h>
>+#include <net/lag.h>
> 
> #include "bonding_priv.h"
> 
>@@ -3406,10 +3407,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> 		(__force u32)flow_get_u32_src(&flow);
> 	hash ^= (hash >> 16);
> 	hash ^= (hash >> 8);
>-

Please avoid changes like this one.


> 	return hash >> 1;
> }
> 
>+static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
>+						 struct sk_buff *skb,
>+						 struct bond_up_slave *slaves)
>+{
>+	struct slave *slave;
>+	unsigned int count;
>+	u32 hash;
>+
>+	hash = bond_xmit_hash(bond, skb);
>+	count = slaves ? READ_ONCE(slaves->count) : 0;
>+	if (unlikely(!count))
>+		return NULL;
>+
>+	slave = slaves->arr[hash % count];
>+	return slave;
>+}

Why don't you have this helper near bond_3ad_xor_xmit() as you have for
round robin for example?

I think it would make this patch much easier to review if you split to
multiple patches, per-mode.


>+
> /*-------------------------- Device entry points ----------------------------*/
> 
> void bond_work_init_all(struct bonding *bond)
>@@ -3923,16 +3940,15 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
> }
> 
> /**
>- * bond_xmit_slave_id - transmit skb through slave with slave_id
>+ * bond_get_slave_by_id - get xmit slave with slave_id
>  * @bond: bonding device that is transmitting
>- * @skb: buffer to transmit
>  * @slave_id: slave id up to slave_cnt-1 through which to transmit
>  *
>- * This function tries to transmit through slave with slave_id but in case
>+ * This function tries to get slave with slave_id but in case
>  * it fails, it tries to find the first available slave for transmission.
>- * The skb is consumed in all cases, thus the function is void.
>  */
>-static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int slave_id)
>+static struct slave *bond_get_slave_by_id(struct bonding *bond,
>+					  int slave_id)
> {
> 	struct list_head *iter;
> 	struct slave *slave;
>@@ -3941,10 +3957,8 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
> 	/* Here we start from the slave with slave_id */
> 	bond_for_each_slave_rcu(bond, slave, iter) {
> 		if (--i < 0) {
>-			if (bond_slave_can_tx(slave)) {
>-				bond_dev_queue_xmit(bond, skb, slave->dev);
>-				return;
>-			}
>+			if (bond_slave_can_tx(slave))
>+				return slave;
> 		}
> 	}
> 
>@@ -3953,13 +3967,11 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
> 	bond_for_each_slave_rcu(bond, slave, iter) {
> 		if (--i < 0)
> 			break;
>-		if (bond_slave_can_tx(slave)) {
>-			bond_dev_queue_xmit(bond, skb, slave->dev);
>-			return;
>-		}
>+		if (bond_slave_can_tx(slave))
>+			return slave;
> 	}
>-	/* no slave that can tx has been found */
>-	bond_tx_drop(bond->dev, skb);
>+
>+	return NULL;
> }
> 
> /**
>@@ -3995,10 +4007,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
> 	return slave_id;
> }
> 
>-static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>-					struct net_device *bond_dev)
>+static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
>+						    struct sk_buff *skb)
> {
>-	struct bonding *bond = netdev_priv(bond_dev);
> 	struct slave *slave;
> 	int slave_cnt;
> 	u32 slave_id;
>@@ -4020,24 +4031,40 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
> 		if (iph->protocol == IPPROTO_IGMP) {
> 			slave = rcu_dereference(bond->curr_active_slave);
> 			if (slave)
>-				bond_dev_queue_xmit(bond, skb, slave->dev);
>-			else
>-				bond_xmit_slave_id(bond, skb, 0);
>-			return NETDEV_TX_OK;
>+				return slave;
>+			return bond_get_slave_by_id(bond, 0);
> 		}
> 	}
> 
> non_igmp:
> 	slave_cnt = READ_ONCE(bond->slave_cnt);
> 	if (likely(slave_cnt)) {
>-		slave_id = bond_rr_gen_slave_id(bond);
>-		bond_xmit_slave_id(bond, skb, slave_id % slave_cnt);
>-	} else {
>-		bond_tx_drop(bond_dev, skb);
>+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
>+		return bond_get_slave_by_id(bond, slave_id);
> 	}
>+	return NULL;
>+}
>+
>+static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>+					struct net_device *bond_dev)
>+{
>+	struct bonding *bond = netdev_priv(bond_dev);
>+	struct slave *slave;
>+
>+	slave = bond_xmit_roundrobin_slave_get(bond, skb);
>+	if (slave)
>+		bond_dev_queue_xmit(bond, skb, slave->dev);
>+	else
>+		bond_tx_drop(bond_dev, skb);
> 	return NETDEV_TX_OK;
> }
> 
>+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
>+						      struct sk_buff *skb)
>+{
>+	return rcu_dereference(bond->curr_active_slave);
>+}
>+
> /* In active-backup mode, we know that bond->curr_active_slave is always valid if
>  * the bond has a usable interface.
>  */
>@@ -4047,7 +4074,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct slave *slave;
> 
>-	slave = rcu_dereference(bond->curr_active_slave);
>+	slave = bond_xmit_activebackup_slave_get(bond, skb);
> 	if (slave)
> 		bond_dev_queue_xmit(bond, skb, slave->dev);
> 	else
>@@ -4193,18 +4220,15 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
> 				     struct net_device *dev)
> {
> 	struct bonding *bond = netdev_priv(dev);
>-	struct slave *slave;
> 	struct bond_up_slave *slaves;
>-	unsigned int count;
>+	struct slave *slave;
> 
> 	slaves = rcu_dereference(bond->usable_slaves);
>-	count = slaves ? READ_ONCE(slaves->count) : 0;
>-	if (likely(count)) {
>-		slave = slaves->arr[bond_xmit_hash(bond, skb) % count];
>+	slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves);
>+	if (likely(slave))
> 		bond_dev_queue_xmit(bond, skb, slave->dev);
>-	} else {
>+	else
> 		bond_tx_drop(dev, skb);
>-	}
> 
> 	return NETDEV_TX_OK;
> }
>diff --git a/include/net/bond_alb.h b/include/net/bond_alb.h
>index b3504fcd773d..f6af76c87a6c 100644
>--- a/include/net/bond_alb.h
>+++ b/include/net/bond_alb.h
>@@ -158,6 +158,10 @@ void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char
> void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave);
> int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
> int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
>+struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
>+				      struct sk_buff *skb);
>+struct slave *bond_xmit_tlb_slave_get(struct bonding *bond,
>+				      struct sk_buff *skb);
> void bond_alb_monitor(struct work_struct *);
> int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr);
> void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id);
>-- 
>2.17.2
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 04/10] bonding: Implement ndo_xmit_slave_get
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 04/10] bonding: Implement ndo_xmit_slave_get Maor Gottlieb
@ 2020-04-20 15:04   ` Jiri Pirko
  2020-04-20 15:36   ` David Ahern
  1 sibling, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 15:04 UTC (permalink / raw)
  To: Maor Gottlieb
  Cc: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 09:54:20AM CEST, maorg@mellanox.com wrote:
>Add implementation of ndo_xmit_slave_get.
>When user sets the LAG_FLAGS_HASH_ALL_SLAVES bit and the xmit slave
>result is based on the hash, then the slave will be selected from the
>array of all the slaves.
>
>Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>---
> drivers/net/bonding/bond_main.c | 123 +++++++++++++++++++++++++++-----
> include/net/bonding.h           |   1 +
> 2 files changed, 105 insertions(+), 19 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 7e04be86fda8..320bcb1394fd 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4137,6 +4137,40 @@ static void bond_skip_slave(struct bond_up_slave *slaves,
> 	}
> }
> 
>+static void bond_set_slave_arr(struct bonding *bond,
>+			       struct bond_up_slave *usable_slaves,
>+			       struct bond_up_slave *all_slaves)
>+{
>+	struct bond_up_slave *usable, *all;
>+
>+	usable = rtnl_dereference(bond->usable_slaves);
>+	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
>+	if (usable)
>+		kfree_rcu(usable, rcu);
>+
>+	all = rtnl_dereference(bond->all_slaves);
>+	rcu_assign_pointer(bond->all_slaves, all_slaves);
>+	if (all)
>+		kfree_rcu(all, rcu);
>+}
>+
>+static void bond_reset_slave_arr(struct bonding *bond)
>+{
>+	struct bond_up_slave *usable, *all;
>+
>+	usable = rtnl_dereference(bond->usable_slaves);
>+	if (usable) {
>+		RCU_INIT_POINTER(bond->usable_slaves, NULL);
>+		kfree_rcu(usable, rcu);
>+	}
>+
>+	all = rtnl_dereference(bond->all_slaves);
>+	if (all) {
>+		RCU_INIT_POINTER(bond->all_slaves, NULL);
>+		kfree_rcu(all, rcu);
>+	}
>+}

Could you please push addition of all_slaves arr into a separate patch?


>+
> /* Build the usable slaves array in control path for modes that use xmit-hash
>  * to determine the slave interface -
>  * (a) BOND_MODE_8023AD
>@@ -4147,7 +4181,7 @@ static void bond_skip_slave(struct bond_up_slave *slaves,
>  */
> int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> {
>-	struct bond_up_slave *usable_slaves, *old_usable_slaves;
>+	struct bond_up_slave *usable_slaves = NULL, *all_slaves = NULL;
> 	struct slave *slave;
> 	struct list_head *iter;
> 	int agg_id = 0;
>@@ -4159,7 +4193,9 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> 
> 	usable_slaves = kzalloc(struct_size(usable_slaves, arr,
> 					    bond->slave_cnt), GFP_KERNEL);
>-	if (!usable_slaves) {
>+	all_slaves = kzalloc(struct_size(all_slaves, arr,
>+					 bond->slave_cnt), GFP_KERNEL);
>+	if (!usable_slaves || !all_slaves) {
> 		ret = -ENOMEM;
> 		goto out;
> 	}
>@@ -4168,20 +4204,19 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> 
> 		if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
> 			pr_debug("bond_3ad_get_active_agg_info failed\n");
>-			kfree_rcu(usable_slaves, rcu);
> 			/* No active aggragator means it's not safe to use
> 			 * the previous array.
> 			 */
>-			old_usable_slaves = rtnl_dereference(bond->usable_slaves);
>-			if (old_usable_slaves) {
>-				RCU_INIT_POINTER(bond->usable_slaves, NULL);
>-				kfree_rcu(old_usable_slaves, rcu);
>-			}
>+			bond_reset_slave_arr(bond);
> 			goto out;
> 		}
> 		agg_id = ad_info.aggregator_id;
> 	}
> 	bond_for_each_slave(bond, slave, iter) {
>+		if (skipslave == slave)
>+			continue;
>+
>+		all_slaves->arr[all_slaves->count++] = slave;
> 		if (BOND_MODE(bond) == BOND_MODE_8023AD) {
> 			struct aggregator *agg;
> 
>@@ -4191,8 +4226,6 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> 		}
> 		if (!bond_slave_can_tx(slave))
> 			continue;
>-		if (skipslave == slave)
>-			continue;
> 
> 		slave_dbg(bond->dev, slave->dev, "Adding slave to tx hash array[%d]\n",
> 			  usable_slaves->count);
>@@ -4200,14 +4233,17 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> 		usable_slaves->arr[usable_slaves->count++] = slave;
> 	}
> 
>-	old_usable_slaves = rtnl_dereference(bond->usable_slaves);
>-	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
>-	if (old_usable_slaves)
>-		kfree_rcu(old_usable_slaves, rcu);
>+	bond_set_slave_arr(bond, usable_slaves, all_slaves);
>+	return ret;
> out:
>-	if (ret != 0 && skipslave)
>+	if (ret != 0 && skipslave) {
>+		bond_skip_slave(rtnl_dereference(bond->all_slaves),
>+				skipslave);
> 		bond_skip_slave(rtnl_dereference(bond->usable_slaves),
> 				skipslave);
>+	}
>+	kfree_rcu(all_slaves, rcu);
>+	kfree_rcu(usable_slaves, rcu);
> 
> 	return ret;
> }
>@@ -4313,6 +4349,48 @@ static u16 bond_select_queue(struct net_device *dev, struct sk_buff *skb,
> 	return txq;
> }
> 
>+static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
>+					      struct sk_buff *skb,
>+					      u16 flags)
>+{
>+	struct bonding *bond = netdev_priv(master_dev);
>+	struct bond_up_slave *slaves;
>+	struct slave *slave = NULL;
>+
>+	switch (BOND_MODE(bond)) {
>+	case BOND_MODE_ROUNDROBIN:
>+		slave = bond_xmit_roundrobin_slave_get(bond, skb);
>+		break;
>+	case BOND_MODE_ACTIVEBACKUP:
>+		slave = bond_xmit_activebackup_slave_get(bond, skb);
>+		break;
>+	case BOND_MODE_8023AD:
>+	case BOND_MODE_XOR:
>+		if (flags & LAG_FLAGS_HASH_ALL_SLAVES)
>+			slaves = rcu_dereference(bond->all_slaves);
>+		else
>+			slaves = rcu_dereference(bond->usable_slaves);
>+		slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves);
>+		break;
>+	case BOND_MODE_BROADCAST:
>+		break;
>+	case BOND_MODE_ALB:
>+		slave = bond_xmit_alb_slave_get(bond, skb);
>+		break;
>+	case BOND_MODE_TLB:
>+		slave = bond_xmit_tlb_slave_get(bond, skb);
>+		break;
>+	default:
>+		/* Should never happen, mode already checked */
>+		WARN_ONCE(true, "Unknown bonding mode");

Return NULL here right away. No need to have the slave init to NULL at
the beginning.


>+		break;
>+	}
>+
>+	if (slave)
>+		return slave->dev;
>+	return NULL;
>+}
>+
> static netdev_tx_t __bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
> {
> 	struct bonding *bond = netdev_priv(dev);
>@@ -4434,6 +4512,7 @@ static const struct net_device_ops bond_netdev_ops = {
> 	.ndo_del_slave		= bond_release,
> 	.ndo_fix_features	= bond_fix_features,
> 	.ndo_features_check	= passthru_features_check,
>+	.ndo_xmit_get_slave	= bond_xmit_get_slave,
> };
> 
> static const struct device_type bond_type = {
>@@ -4501,9 +4580,9 @@ void bond_setup(struct net_device *bond_dev)
> static void bond_uninit(struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
>+	struct bond_up_slave *usable, *all;
> 	struct list_head *iter;
> 	struct slave *slave;
>-	struct bond_up_slave *arr;
> 
> 	bond_netpoll_cleanup(bond_dev);
> 
>@@ -4512,10 +4591,16 @@ static void bond_uninit(struct net_device *bond_dev)
> 		__bond_release_one(bond_dev, slave->dev, true, true);
> 	netdev_info(bond_dev, "Released all slaves\n");
> 
>-	arr = rtnl_dereference(bond->usable_slaves);
>-	if (arr) {
>+	usable = rtnl_dereference(bond->usable_slaves);
>+	if (usable) {
> 		RCU_INIT_POINTER(bond->usable_slaves, NULL);
>-		kfree_rcu(arr, rcu);
>+		kfree_rcu(usable, rcu);
>+	}
>+
>+	all = rtnl_dereference(bond->all_slaves);
>+	if (all) {
>+		RCU_INIT_POINTER(bond->all_slaves, NULL);
>+		kfree_rcu(all, rcu);
> 	}
> 
> 	list_del(&bond->bond_list);
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index 33bdb6d5182d..a2a7f461fa63 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -201,6 +201,7 @@ struct bonding {
> 	struct   slave __rcu *current_arp_slave;
> 	struct   slave __rcu *primary_slave;
> 	struct   bond_up_slave __rcu *usable_slaves; /* Array of usable slaves */
>+	struct   bond_up_slave __rcu *all_slaves; /* Array of all slaves */
> 	bool     force_primary;
> 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
> 	int     (*recv_probe)(const struct sk_buff *, struct bonding *,
>-- 
>2.17.2
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 04/10] bonding: Implement ndo_xmit_slave_get
  2020-04-20  7:54 ` [PATCH V2 mlx5-next 04/10] bonding: Implement ndo_xmit_slave_get Maor Gottlieb
  2020-04-20 15:04   ` Jiri Pirko
@ 2020-04-20 15:36   ` David Ahern
  1 sibling, 0 replies; 31+ messages in thread
From: David Ahern @ 2020-04-20 15:36 UTC (permalink / raw)
  To: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba
  Cc: leonro, saeedm, jiri, linux-rdma, netdev, alexr

On 4/20/20 1:54 AM, Maor Gottlieb wrote:
> Add implementation of ndo_xmit_slave_get.
> When user sets the LAG_FLAGS_HASH_ALL_SLAVES bit and the xmit slave
> result is based on the hash, then the slave will be selected from the
> array of all the slaves.
> 
> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
> ---
>  drivers/net/bonding/bond_main.c | 123 +++++++++++++++++++++++++++-----
>  include/net/bonding.h           |   1 +
>  2 files changed, 105 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 7e04be86fda8..320bcb1394fd 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4137,6 +4137,40 @@ static void bond_skip_slave(struct bond_up_slave *slaves,
>  	}
>  }
>  
> +static void bond_set_slave_arr(struct bonding *bond,
> +			       struct bond_up_slave *usable_slaves,
> +			       struct bond_up_slave *all_slaves)
> +{
> +	struct bond_up_slave *usable, *all;
> +
> +	usable = rtnl_dereference(bond->usable_slaves);
> +	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
> +	if (usable)
> +		kfree_rcu(usable, rcu);
> +
> +	all = rtnl_dereference(bond->all_slaves);
> +	rcu_assign_pointer(bond->all_slaves, all_slaves);
> +	if (all)
> +		kfree_rcu(all, rcu);
> +}
> +
> +static void bond_reset_slave_arr(struct bonding *bond)
> +{
> +	struct bond_up_slave *usable, *all;
> +
> +	usable = rtnl_dereference(bond->usable_slaves);
> +	if (usable) {
> +		RCU_INIT_POINTER(bond->usable_slaves, NULL);
> +		kfree_rcu(usable, rcu);
> +	}
> +
> +	all = rtnl_dereference(bond->all_slaves);
> +	if (all) {
> +		RCU_INIT_POINTER(bond->all_slaves, NULL);
> +		kfree_rcu(all, rcu);
> +	}
> +}
> +
>  /* Build the usable slaves array in control path for modes that use xmit-hash
>   * to determine the slave interface -
>   * (a) BOND_MODE_8023AD
> @@ -4147,7 +4181,7 @@ static void bond_skip_slave(struct bond_up_slave *slaves,
>   */
>  int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
>  {
> -	struct bond_up_slave *usable_slaves, *old_usable_slaves;
> +	struct bond_up_slave *usable_slaves = NULL, *all_slaves = NULL;
>  	struct slave *slave;
>  	struct list_head *iter;
>  	int agg_id = 0;
> @@ -4159,7 +4193,9 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
>  
>  	usable_slaves = kzalloc(struct_size(usable_slaves, arr,
>  					    bond->slave_cnt), GFP_KERNEL);
> -	if (!usable_slaves) {
> +	all_slaves = kzalloc(struct_size(all_slaves, arr,
> +					 bond->slave_cnt), GFP_KERNEL);
> +	if (!usable_slaves || !all_slaves) {
>  		ret = -ENOMEM;
>  		goto out;
>  	}
> @@ -4168,20 +4204,19 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
>  
>  		if (bond_3ad_get_active_agg_info(bond, &ad_info)) {
>  			pr_debug("bond_3ad_get_active_agg_info failed\n");
> -			kfree_rcu(usable_slaves, rcu);
>  			/* No active aggragator means it's not safe to use
>  			 * the previous array.
>  			 */
> -			old_usable_slaves = rtnl_dereference(bond->usable_slaves);
> -			if (old_usable_slaves) {
> -				RCU_INIT_POINTER(bond->usable_slaves, NULL);
> -				kfree_rcu(old_usable_slaves, rcu);
> -			}
> +			bond_reset_slave_arr(bond);
>  			goto out;
>  		}
>  		agg_id = ad_info.aggregator_id;
>  	}
>  	bond_for_each_slave(bond, slave, iter) {
> +		if (skipslave == slave)
> +			continue;
> +
> +		all_slaves->arr[all_slaves->count++] = slave;
>  		if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>  			struct aggregator *agg;
>  
> @@ -4191,8 +4226,6 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
>  		}
>  		if (!bond_slave_can_tx(slave))
>  			continue;
> -		if (skipslave == slave)
> -			continue;
>  
>  		slave_dbg(bond->dev, slave->dev, "Adding slave to tx hash array[%d]\n",
>  			  usable_slaves->count);
> @@ -4200,14 +4233,17 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
>  		usable_slaves->arr[usable_slaves->count++] = slave;
>  	}
>  
> -	old_usable_slaves = rtnl_dereference(bond->usable_slaves);
> -	rcu_assign_pointer(bond->usable_slaves, usable_slaves);
> -	if (old_usable_slaves)
> -		kfree_rcu(old_usable_slaves, rcu);
> +	bond_set_slave_arr(bond, usable_slaves, all_slaves);
> +	return ret;
>  out:
> -	if (ret != 0 && skipslave)
> +	if (ret != 0 && skipslave) {
> +		bond_skip_slave(rtnl_dereference(bond->all_slaves),
> +				skipslave);
>  		bond_skip_slave(rtnl_dereference(bond->usable_slaves),
>  				skipslave);
> +	}
> +	kfree_rcu(all_slaves, rcu);
> +	kfree_rcu(usable_slaves, rcu);
>  
>  	return ret;
>  }

none of the above code has anything to do directly with looking up the
bond slave in bond_xmit_get_slave; all of that and the bond_uninit
changes should be done in refactoring patch(es) that prepare existing
code to be called from the new ndo.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 14:01   ` Jiri Pirko
@ 2020-04-20 17:29     ` David Ahern
  2020-04-20 17:41       ` Jiri Pirko
  2020-04-20 17:54       ` Jiri Pirko
  0 siblings, 2 replies; 31+ messages in thread
From: David Ahern @ 2020-04-20 17:29 UTC (permalink / raw)
  To: Jiri Pirko, Maor Gottlieb
  Cc: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr

On 4/20/20 8:01 AM, Jiri Pirko wrote:
> Mon, Apr 20, 2020 at 09:54:17AM CEST, maorg@mellanox.com wrote:
>> Add new ndo to get the xmit slave of master device.
>> User should release the slave when it's not longer needed.
>> When slave selection method is based on hash, then the user can ask to
>> get the xmit slave assume all the slaves can transmit by setting the
>> LAG_FLAGS_HASH_ALL_SLAVES bit in the flags argument.
>>
>> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>> ---
>> include/linux/netdevice.h |  3 +++
>> include/net/lag.h         | 32 ++++++++++++++++++++++++++++++++
>> 2 files changed, 35 insertions(+)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 130a668049ab..e8852f3ad0b6 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1389,6 +1389,9 @@ struct net_device_ops {
>> 						 struct netlink_ext_ack *extack);
>> 	int			(*ndo_del_slave)(struct net_device *dev,
>> 						 struct net_device *slave_dev);
>> +	struct net_device*	(*ndo_xmit_get_slave)(struct net_device *master_dev,
>> +						      struct sk_buff *skb,
>> +						      u16 flags);
> 
> Please adjust the name to:
> ndo_get_lag_xmit_slave

I disagree. There are multiple master devices and no reason to have a
LAG specific get_slave.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 17:29     ` David Ahern
@ 2020-04-20 17:41       ` Jiri Pirko
  2020-04-20 17:43         ` David Ahern
  2020-04-20 17:54       ` Jiri Pirko
  1 sibling, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 17:41 UTC (permalink / raw)
  To: David Ahern
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 07:29:15PM CEST, dsahern@gmail.com wrote:
>On 4/20/20 8:01 AM, Jiri Pirko wrote:
>> Mon, Apr 20, 2020 at 09:54:17AM CEST, maorg@mellanox.com wrote:
>>> Add new ndo to get the xmit slave of master device.
>>> User should release the slave when it's not longer needed.
>>> When slave selection method is based on hash, then the user can ask to
>>> get the xmit slave assume all the slaves can transmit by setting the
>>> LAG_FLAGS_HASH_ALL_SLAVES bit in the flags argument.
>>>
>>> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>>> ---
>>> include/linux/netdevice.h |  3 +++
>>> include/net/lag.h         | 32 ++++++++++++++++++++++++++++++++
>>> 2 files changed, 35 insertions(+)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 130a668049ab..e8852f3ad0b6 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -1389,6 +1389,9 @@ struct net_device_ops {
>>> 						 struct netlink_ext_ack *extack);
>>> 	int			(*ndo_del_slave)(struct net_device *dev,
>>> 						 struct net_device *slave_dev);
>>> +	struct net_device*	(*ndo_xmit_get_slave)(struct net_device *master_dev,
>>> +						      struct sk_buff *skb,
>>> +						      u16 flags);
>> 
>> Please adjust the name to:
>> ndo_get_lag_xmit_slave
>
>I disagree. There are multiple master devices and no reason to have a
>LAG specific get_slave.

Do you have usecase for any other non-lag master type device?
Note the ndo name can change whenever needed. I think the name should
reflect the usage.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 17:41       ` Jiri Pirko
@ 2020-04-20 17:43         ` David Ahern
  2020-04-20 17:53           ` Jiri Pirko
  0 siblings, 1 reply; 31+ messages in thread
From: David Ahern @ 2020-04-20 17:43 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

On 4/20/20 11:41 AM, Jiri Pirko wrote:
>>
>> I disagree. There are multiple master devices and no reason to have a
>> LAG specific get_slave.
> 
> Do you have usecase for any other non-lag master type device?
> Note the ndo name can change whenever needed. I think the name should
> reflect the usage.
> 

right now, no. But nothing about the current need is LAG specific, so
don't make it seem like it is LAG specific with the name.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 17:43         ` David Ahern
@ 2020-04-20 17:53           ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 17:53 UTC (permalink / raw)
  To: David Ahern
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 07:43:14PM CEST, dsahern@gmail.com wrote:
>On 4/20/20 11:41 AM, Jiri Pirko wrote:
>>>
>>> I disagree. There are multiple master devices and no reason to have a
>>> LAG specific get_slave.
>> 
>> Do you have usecase for any other non-lag master type device?
>> Note the ndo name can change whenever needed. I think the name should
>> reflect the usage.
>> 
>
>right now, no. But nothing about the current need is LAG specific, so
>don't make it seem like it is LAG specific with the name.

I don't care really, I just thought we can make the connection by the
name. Makes sense to me.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 17:29     ` David Ahern
  2020-04-20 17:41       ` Jiri Pirko
@ 2020-04-20 17:54       ` Jiri Pirko
  2020-04-20 17:56         ` David Ahern
  1 sibling, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 17:54 UTC (permalink / raw)
  To: David Ahern
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 07:29:15PM CEST, dsahern@gmail.com wrote:
>On 4/20/20 8:01 AM, Jiri Pirko wrote:
>> Mon, Apr 20, 2020 at 09:54:17AM CEST, maorg@mellanox.com wrote:
>>> Add new ndo to get the xmit slave of master device.
>>> User should release the slave when it's not longer needed.
>>> When slave selection method is based on hash, then the user can ask to
>>> get the xmit slave assume all the slaves can transmit by setting the
>>> LAG_FLAGS_HASH_ALL_SLAVES bit in the flags argument.
>>>
>>> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>>> ---
>>> include/linux/netdevice.h |  3 +++
>>> include/net/lag.h         | 32 ++++++++++++++++++++++++++++++++
>>> 2 files changed, 35 insertions(+)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 130a668049ab..e8852f3ad0b6 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -1389,6 +1389,9 @@ struct net_device_ops {
>>> 						 struct netlink_ext_ack *extack);
>>> 	int			(*ndo_del_slave)(struct net_device *dev,
>>> 						 struct net_device *slave_dev);
>>> +	struct net_device*	(*ndo_xmit_get_slave)(struct net_device *master_dev,
>>> +						      struct sk_buff *skb,
>>> +						      u16 flags);
>> 
>> Please adjust the name to:
>> ndo_get_lag_xmit_slave
>
>I disagree. There are multiple master devices and no reason to have a
>LAG specific get_slave.

Btw, did you notice that Maor is passing "lag" named values in the flags?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 17:54       ` Jiri Pirko
@ 2020-04-20 17:56         ` David Ahern
  2020-04-20 18:01           ` Jiri Pirko
  0 siblings, 1 reply; 31+ messages in thread
From: David Ahern @ 2020-04-20 17:56 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

On 4/20/20 11:54 AM, Jiri Pirko wrote:
> Mon, Apr 20, 2020 at 07:29:15PM CEST, dsahern@gmail.com wrote:
>> On 4/20/20 8:01 AM, Jiri Pirko wrote:
>>> Mon, Apr 20, 2020 at 09:54:17AM CEST, maorg@mellanox.com wrote:
>>>> Add new ndo to get the xmit slave of master device.
>>>> User should release the slave when it's not longer needed.
>>>> When slave selection method is based on hash, then the user can ask to
>>>> get the xmit slave assume all the slaves can transmit by setting the
>>>> LAG_FLAGS_HASH_ALL_SLAVES bit in the flags argument.
>>>>
>>>> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>>>> ---
>>>> include/linux/netdevice.h |  3 +++
>>>> include/net/lag.h         | 32 ++++++++++++++++++++++++++++++++
>>>> 2 files changed, 35 insertions(+)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 130a668049ab..e8852f3ad0b6 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -1389,6 +1389,9 @@ struct net_device_ops {
>>>> 						 struct netlink_ext_ack *extack);
>>>> 	int			(*ndo_del_slave)(struct net_device *dev,
>>>> 						 struct net_device *slave_dev);
>>>> +	struct net_device*	(*ndo_xmit_get_slave)(struct net_device *master_dev,
>>>> +						      struct sk_buff *skb,
>>>> +						      u16 flags);
>>>
>>> Please adjust the name to:
>>> ndo_get_lag_xmit_slave
>>
>> I disagree. There are multiple master devices and no reason to have a
>> LAG specific get_slave.
> 
> Btw, did you notice that Maor is passing "lag" named values in the flags?
> 

yes. I disagree with enum name, but having LAG in the name of a flag is
fine. To me that is the right place for a LAG specific request of a
generic ndo in core code.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 17:56         ` David Ahern
@ 2020-04-20 18:01           ` Jiri Pirko
  2020-04-20 18:04             ` David Ahern
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 18:01 UTC (permalink / raw)
  To: David Ahern
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 07:56:58PM CEST, dsahern@gmail.com wrote:
>On 4/20/20 11:54 AM, Jiri Pirko wrote:
>> Mon, Apr 20, 2020 at 07:29:15PM CEST, dsahern@gmail.com wrote:
>>> On 4/20/20 8:01 AM, Jiri Pirko wrote:
>>>> Mon, Apr 20, 2020 at 09:54:17AM CEST, maorg@mellanox.com wrote:
>>>>> Add new ndo to get the xmit slave of master device.
>>>>> User should release the slave when it's not longer needed.
>>>>> When slave selection method is based on hash, then the user can ask to
>>>>> get the xmit slave assume all the slaves can transmit by setting the
>>>>> LAG_FLAGS_HASH_ALL_SLAVES bit in the flags argument.
>>>>>
>>>>> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>>>>> ---
>>>>> include/linux/netdevice.h |  3 +++
>>>>> include/net/lag.h         | 32 ++++++++++++++++++++++++++++++++
>>>>> 2 files changed, 35 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>> index 130a668049ab..e8852f3ad0b6 100644
>>>>> --- a/include/linux/netdevice.h
>>>>> +++ b/include/linux/netdevice.h
>>>>> @@ -1389,6 +1389,9 @@ struct net_device_ops {
>>>>> 						 struct netlink_ext_ack *extack);
>>>>> 	int			(*ndo_del_slave)(struct net_device *dev,
>>>>> 						 struct net_device *slave_dev);
>>>>> +	struct net_device*	(*ndo_xmit_get_slave)(struct net_device *master_dev,
>>>>> +						      struct sk_buff *skb,
>>>>> +						      u16 flags);
>>>>
>>>> Please adjust the name to:
>>>> ndo_get_lag_xmit_slave
>>>
>>> I disagree. There are multiple master devices and no reason to have a
>>> LAG specific get_slave.
>> 
>> Btw, did you notice that Maor is passing "lag" named values in the flags?
>> 
>
>yes. I disagree with enum name, but having LAG in the name of a flag is
>fine. To me that is the right place for a LAG specific request of a
>generic ndo in core code.

Generic ndo with lag-specific arg? Odd. Plus, there is a small chance
this is ever going to be used for other master. And if so, could be very
easily renamed then...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 18:01           ` Jiri Pirko
@ 2020-04-20 18:04             ` David Ahern
  2020-04-20 18:48               ` Jiri Pirko
  0 siblings, 1 reply; 31+ messages in thread
From: David Ahern @ 2020-04-20 18:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

On 4/20/20 12:01 PM, Jiri Pirko wrote:
> Generic ndo with lag-specific arg? Odd. Plus, there is a small chance
> this is ever going to be used for other master. And if so, could be very
> easily renamed then...

core code should be generic, not specific and renamed at a later date
when a second use case arises.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 03/10] bonding: Add helpers to get xmit slave
  2020-04-20 14:27   ` Jiri Pirko
@ 2020-04-20 18:46     ` Jay Vosburgh
  2020-04-20 18:59       ` Maor Gottlieb
  0 siblings, 1 reply; 31+ messages in thread
From: Jay Vosburgh @ 2020-04-20 18:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Maor Gottlieb, davem, jgg, dledford, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr

Jiri Pirko <jiri@resnulli.us> wrote:

>Mon, Apr 20, 2020 at 09:54:19AM CEST, maorg@mellanox.com wrote:
>>This helpers will be used by both the xmit function
>>and the get xmit slave ndo.
>
>Be more verbose about what you are doing please. From this I have no
>clue what is going on.

	Agreed, and also with Jiri's comment further down to split this
into multiple patches.  The current series is difficult to follow.

	-J

>
>>
>>Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>>---
>> drivers/net/bonding/bond_alb.c  | 35 ++++++++----
>> drivers/net/bonding/bond_main.c | 94 +++++++++++++++++++++------------
>> include/net/bond_alb.h          |  4 ++
>> 3 files changed, 89 insertions(+), 44 deletions(-)
>>
>>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>>index 7bb49b049dcc..e863c694c309 100644
>>--- a/drivers/net/bonding/bond_alb.c
>>+++ b/drivers/net/bonding/bond_alb.c
>>@@ -1334,11 +1334,11 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
>> 	return NETDEV_TX_OK;
>> }
>> 
>>-netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>+struct slave *bond_xmit_tlb_slave_get(struct bonding *bond,
>>+				      struct sk_buff *skb)
>> {
>>-	struct bonding *bond = netdev_priv(bond_dev);
>>-	struct ethhdr *eth_data;
>> 	struct slave *tx_slave = NULL;
>>+	struct ethhdr *eth_data;
>> 	u32 hash_index;
>> 
>> 	skb_reset_mac_header(skb);
>>@@ -1369,20 +1369,29 @@ netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>> 			break;
>> 		}
>> 	}
>>-	return bond_do_alb_xmit(skb, bond, tx_slave);
>>+	return tx_slave;
>> }
>> 
>>-netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>+netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>> {
>> 	struct bonding *bond = netdev_priv(bond_dev);
>>-	struct ethhdr *eth_data;
>>+	struct slave *tx_slave;
>>+
>>+	tx_slave = bond_xmit_tlb_slave_get(bond, skb);
>>+	return bond_do_alb_xmit(skb, bond, tx_slave);
>>+}
>>+
>>+struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
>>+				      struct sk_buff *skb)
>>+{
>> 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>>-	struct slave *tx_slave = NULL;
>> 	static const __be32 ip_bcast = htonl(0xffffffff);
>>-	int hash_size = 0;
>>+	struct slave *tx_slave = NULL;
>>+	const u8 *hash_start = NULL;
>> 	bool do_tx_balance = true;
>>+	struct ethhdr *eth_data;
>> 	u32 hash_index = 0;
>>-	const u8 *hash_start = NULL;
>>+	int hash_size = 0;
>> 
>> 	skb_reset_mac_header(skb);
>> 	eth_data = eth_hdr(skb);
>>@@ -1501,7 +1510,15 @@ netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>> 						       count];
>> 		}
>> 	}
>>+	return tx_slave;
>>+}
>>+
>>+netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>+{
>>+	struct bonding *bond = netdev_priv(bond_dev);
>>+	struct slave *tx_slave = NULL;
>> 
>>+	tx_slave = bond_xmit_alb_slave_get(bond, skb);
>> 	return bond_do_alb_xmit(skb, bond, tx_slave);
>> }
>> 
>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>index 2cb41d480ae2..7e04be86fda8 100644
>>--- a/drivers/net/bonding/bond_main.c
>>+++ b/drivers/net/bonding/bond_main.c
>>@@ -82,6 +82,7 @@
>> #include <net/bonding.h>
>> #include <net/bond_3ad.h>
>> #include <net/bond_alb.h>
>>+#include <net/lag.h>
>> 
>> #include "bonding_priv.h"
>> 
>>@@ -3406,10 +3407,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
>> 		(__force u32)flow_get_u32_src(&flow);
>> 	hash ^= (hash >> 16);
>> 	hash ^= (hash >> 8);
>>-
>
>Please avoid changes like this one.
>
>
>> 	return hash >> 1;
>> }
>> 
>>+static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
>>+						 struct sk_buff *skb,
>>+						 struct bond_up_slave *slaves)
>>+{
>>+	struct slave *slave;
>>+	unsigned int count;
>>+	u32 hash;
>>+
>>+	hash = bond_xmit_hash(bond, skb);
>>+	count = slaves ? READ_ONCE(slaves->count) : 0;
>>+	if (unlikely(!count))
>>+		return NULL;
>>+
>>+	slave = slaves->arr[hash % count];
>>+	return slave;
>>+}
>
>Why don't you have this helper near bond_3ad_xor_xmit() as you have for
>round robin for example?
>
>I think it would make this patch much easier to review if you split to
>multiple patches, per-mode.
>
>
>>+
>> /*-------------------------- Device entry points ----------------------------*/
>> 
>> void bond_work_init_all(struct bonding *bond)
>>@@ -3923,16 +3940,15 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
>> }
>> 
>> /**
>>- * bond_xmit_slave_id - transmit skb through slave with slave_id
>>+ * bond_get_slave_by_id - get xmit slave with slave_id
>>  * @bond: bonding device that is transmitting
>>- * @skb: buffer to transmit
>>  * @slave_id: slave id up to slave_cnt-1 through which to transmit
>>  *
>>- * This function tries to transmit through slave with slave_id but in case
>>+ * This function tries to get slave with slave_id but in case
>>  * it fails, it tries to find the first available slave for transmission.
>>- * The skb is consumed in all cases, thus the function is void.
>>  */
>>-static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int slave_id)
>>+static struct slave *bond_get_slave_by_id(struct bonding *bond,
>>+					  int slave_id)
>> {
>> 	struct list_head *iter;
>> 	struct slave *slave;
>>@@ -3941,10 +3957,8 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
>> 	/* Here we start from the slave with slave_id */
>> 	bond_for_each_slave_rcu(bond, slave, iter) {
>> 		if (--i < 0) {
>>-			if (bond_slave_can_tx(slave)) {
>>-				bond_dev_queue_xmit(bond, skb, slave->dev);
>>-				return;
>>-			}
>>+			if (bond_slave_can_tx(slave))
>>+				return slave;
>> 		}
>> 	}
>> 
>>@@ -3953,13 +3967,11 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
>> 	bond_for_each_slave_rcu(bond, slave, iter) {
>> 		if (--i < 0)
>> 			break;
>>-		if (bond_slave_can_tx(slave)) {
>>-			bond_dev_queue_xmit(bond, skb, slave->dev);
>>-			return;
>>-		}
>>+		if (bond_slave_can_tx(slave))
>>+			return slave;
>> 	}
>>-	/* no slave that can tx has been found */
>>-	bond_tx_drop(bond->dev, skb);
>>+
>>+	return NULL;
>> }
>> 
>> /**
>>@@ -3995,10 +4007,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
>> 	return slave_id;
>> }
>> 
>>-static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>>-					struct net_device *bond_dev)
>>+static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
>>+						    struct sk_buff *skb)
>> {
>>-	struct bonding *bond = netdev_priv(bond_dev);
>> 	struct slave *slave;
>> 	int slave_cnt;
>> 	u32 slave_id;
>>@@ -4020,24 +4031,40 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>> 		if (iph->protocol == IPPROTO_IGMP) {
>> 			slave = rcu_dereference(bond->curr_active_slave);
>> 			if (slave)
>>-				bond_dev_queue_xmit(bond, skb, slave->dev);
>>-			else
>>-				bond_xmit_slave_id(bond, skb, 0);
>>-			return NETDEV_TX_OK;
>>+				return slave;
>>+			return bond_get_slave_by_id(bond, 0);
>> 		}
>> 	}
>> 
>> non_igmp:
>> 	slave_cnt = READ_ONCE(bond->slave_cnt);
>> 	if (likely(slave_cnt)) {
>>-		slave_id = bond_rr_gen_slave_id(bond);
>>-		bond_xmit_slave_id(bond, skb, slave_id % slave_cnt);
>>-	} else {
>>-		bond_tx_drop(bond_dev, skb);
>>+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
>>+		return bond_get_slave_by_id(bond, slave_id);
>> 	}
>>+	return NULL;
>>+}
>>+
>>+static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>>+					struct net_device *bond_dev)
>>+{
>>+	struct bonding *bond = netdev_priv(bond_dev);
>>+	struct slave *slave;
>>+
>>+	slave = bond_xmit_roundrobin_slave_get(bond, skb);
>>+	if (slave)
>>+		bond_dev_queue_xmit(bond, skb, slave->dev);
>>+	else
>>+		bond_tx_drop(bond_dev, skb);
>> 	return NETDEV_TX_OK;
>> }
>> 
>>+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
>>+						      struct sk_buff *skb)
>>+{
>>+	return rcu_dereference(bond->curr_active_slave);
>>+}
>>+
>> /* In active-backup mode, we know that bond->curr_active_slave is always valid if
>>  * the bond has a usable interface.
>>  */
>>@@ -4047,7 +4074,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
>> 	struct bonding *bond = netdev_priv(bond_dev);
>> 	struct slave *slave;
>> 
>>-	slave = rcu_dereference(bond->curr_active_slave);
>>+	slave = bond_xmit_activebackup_slave_get(bond, skb);
>> 	if (slave)
>> 		bond_dev_queue_xmit(bond, skb, slave->dev);
>> 	else
>>@@ -4193,18 +4220,15 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
>> 				     struct net_device *dev)
>> {
>> 	struct bonding *bond = netdev_priv(dev);
>>-	struct slave *slave;
>> 	struct bond_up_slave *slaves;
>>-	unsigned int count;
>>+	struct slave *slave;
>> 
>> 	slaves = rcu_dereference(bond->usable_slaves);
>>-	count = slaves ? READ_ONCE(slaves->count) : 0;
>>-	if (likely(count)) {
>>-		slave = slaves->arr[bond_xmit_hash(bond, skb) % count];
>>+	slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves);
>>+	if (likely(slave))
>> 		bond_dev_queue_xmit(bond, skb, slave->dev);
>>-	} else {
>>+	else
>> 		bond_tx_drop(dev, skb);
>>-	}
>> 
>> 	return NETDEV_TX_OK;
>> }
>>diff --git a/include/net/bond_alb.h b/include/net/bond_alb.h
>>index b3504fcd773d..f6af76c87a6c 100644
>>--- a/include/net/bond_alb.h
>>+++ b/include/net/bond_alb.h
>>@@ -158,6 +158,10 @@ void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char
>> void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave);
>> int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
>> int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
>>+struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
>>+				      struct sk_buff *skb);
>>+struct slave *bond_xmit_tlb_slave_get(struct bonding *bond,
>>+				      struct sk_buff *skb);
>> void bond_alb_monitor(struct work_struct *);
>> int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr);
>> void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id);
>>-- 
>>2.17.2
>>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 18:04             ` David Ahern
@ 2020-04-20 18:48               ` Jiri Pirko
  2020-04-20 18:56                 ` Maor Gottlieb
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2020-04-20 18:48 UTC (permalink / raw)
  To: David Ahern
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 08:04:01PM CEST, dsahern@gmail.com wrote:
>On 4/20/20 12:01 PM, Jiri Pirko wrote:
>> Generic ndo with lag-specific arg? Odd. Plus, there is a small chance
>> this is ever going to be used for other master. And if so, could be very
>> easily renamed then...
>
>core code should be generic, not specific and renamed at a later date
>when a second use case arises.

Yeah, I guess we just have to agree to disagree :)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 18:48               ` Jiri Pirko
@ 2020-04-20 18:56                 ` Maor Gottlieb
  2020-04-20 19:02                   ` David Ahern
  0 siblings, 1 reply; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20 18:56 UTC (permalink / raw)
  To: Jiri Pirko, David Ahern
  Cc: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr


On 4/20/2020 9:48 PM, Jiri Pirko wrote:
> Mon, Apr 20, 2020 at 08:04:01PM CEST, dsahern@gmail.com wrote:
>> On 4/20/20 12:01 PM, Jiri Pirko wrote:
>>> Generic ndo with lag-specific arg? Odd. Plus, there is a small chance
>>> this is ever going to be used for other master. And if so, could be very
>>> easily renamed then...
>> core code should be generic, not specific and renamed at a later date
>> when a second use case arises.
> Yeah, I guess we just have to agree to disagree :)

So I am remaining with the flags. Any suggestion for better name for the 
enum? Should I move master_xmit_get_slave from lag.h to netdevice.h?
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 03/10] bonding: Add helpers to get xmit slave
  2020-04-20 18:46     ` Jay Vosburgh
@ 2020-04-20 18:59       ` Maor Gottlieb
  0 siblings, 0 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-20 18:59 UTC (permalink / raw)
  To: Jay Vosburgh, Jiri Pirko
  Cc: davem, jgg, dledford, vfalico, andy, kuba, leonro, saeedm, jiri,
	linux-rdma, netdev, alexr


On 4/20/2020 9:46 PM, Jay Vosburgh wrote:
> Jiri Pirko <jiri@resnulli.us> wrote:
>
>> Mon, Apr 20, 2020 at 09:54:19AM CEST, maorg@mellanox.com wrote:
>>> This helpers will be used by both the xmit function
>>> and the get xmit slave ndo.
>> Be more verbose about what you are doing please. From this I have no
>> clue what is going on.
> 	Agreed, and also with Jiri's comment further down to split this
> into multiple patches.  The current series is difficult to follow.
>
> 	-J

Agree, I am splitting this series to more patches so it will be easier 
to follow.
>>> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
>>> ---
>>> drivers/net/bonding/bond_alb.c  | 35 ++++++++----
>>> drivers/net/bonding/bond_main.c | 94 +++++++++++++++++++++------------
>>> include/net/bond_alb.h          |  4 ++
>>> 3 files changed, 89 insertions(+), 44 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>>> index 7bb49b049dcc..e863c694c309 100644
>>> --- a/drivers/net/bonding/bond_alb.c
>>> +++ b/drivers/net/bonding/bond_alb.c
>>> @@ -1334,11 +1334,11 @@ static netdev_tx_t bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
>>> 	return NETDEV_TX_OK;
>>> }
>>>
>>> -netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>> +struct slave *bond_xmit_tlb_slave_get(struct bonding *bond,
>>> +				      struct sk_buff *skb)
>>> {
>>> -	struct bonding *bond = netdev_priv(bond_dev);
>>> -	struct ethhdr *eth_data;
>>> 	struct slave *tx_slave = NULL;
>>> +	struct ethhdr *eth_data;
>>> 	u32 hash_index;
>>>
>>> 	skb_reset_mac_header(skb);
>>> @@ -1369,20 +1369,29 @@ netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>> 			break;
>>> 		}
>>> 	}
>>> -	return bond_do_alb_xmit(skb, bond, tx_slave);
>>> +	return tx_slave;
>>> }
>>>
>>> -netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>> +netdev_tx_t bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>> {
>>> 	struct bonding *bond = netdev_priv(bond_dev);
>>> -	struct ethhdr *eth_data;
>>> +	struct slave *tx_slave;
>>> +
>>> +	tx_slave = bond_xmit_tlb_slave_get(bond, skb);
>>> +	return bond_do_alb_xmit(skb, bond, tx_slave);
>>> +}
>>> +
>>> +struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
>>> +				      struct sk_buff *skb)
>>> +{
>>> 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>>> -	struct slave *tx_slave = NULL;
>>> 	static const __be32 ip_bcast = htonl(0xffffffff);
>>> -	int hash_size = 0;
>>> +	struct slave *tx_slave = NULL;
>>> +	const u8 *hash_start = NULL;
>>> 	bool do_tx_balance = true;
>>> +	struct ethhdr *eth_data;
>>> 	u32 hash_index = 0;
>>> -	const u8 *hash_start = NULL;
>>> +	int hash_size = 0;
>>>
>>> 	skb_reset_mac_header(skb);
>>> 	eth_data = eth_hdr(skb);
>>> @@ -1501,7 +1510,15 @@ netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>> 						       count];
>>> 		}
>>> 	}
>>> +	return tx_slave;
>>> +}
>>> +
>>> +netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
>>> +{
>>> +	struct bonding *bond = netdev_priv(bond_dev);
>>> +	struct slave *tx_slave = NULL;
>>>
>>> +	tx_slave = bond_xmit_alb_slave_get(bond, skb);
>>> 	return bond_do_alb_xmit(skb, bond, tx_slave);
>>> }
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 2cb41d480ae2..7e04be86fda8 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -82,6 +82,7 @@
>>> #include <net/bonding.h>
>>> #include <net/bond_3ad.h>
>>> #include <net/bond_alb.h>
>>> +#include <net/lag.h>
>>>
>>> #include "bonding_priv.h"
>>>
>>> @@ -3406,10 +3407,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
>>> 		(__force u32)flow_get_u32_src(&flow);
>>> 	hash ^= (hash >> 16);
>>> 	hash ^= (hash >> 8);
>>> -
>> Please avoid changes like this one.
>>
>>
>>> 	return hash >> 1;
>>> }
>>>
>>> +static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
>>> +						 struct sk_buff *skb,
>>> +						 struct bond_up_slave *slaves)
>>> +{
>>> +	struct slave *slave;
>>> +	unsigned int count;
>>> +	u32 hash;
>>> +
>>> +	hash = bond_xmit_hash(bond, skb);
>>> +	count = slaves ? READ_ONCE(slaves->count) : 0;
>>> +	if (unlikely(!count))
>>> +		return NULL;
>>> +
>>> +	slave = slaves->arr[hash % count];
>>> +	return slave;
>>> +}
>> Why don't you have this helper near bond_3ad_xor_xmit() as you have for
>> round robin for example?

No good reason, will move it near.
>>
>> I think it would make this patch much easier to review if you split to
>> multiple patches, per-mode.
>>
>>
>>> +
>>> /*-------------------------- Device entry points ----------------------------*/
>>>
>>> void bond_work_init_all(struct bonding *bond)
>>> @@ -3923,16 +3940,15 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
>>> }
>>>
>>> /**
>>> - * bond_xmit_slave_id - transmit skb through slave with slave_id
>>> + * bond_get_slave_by_id - get xmit slave with slave_id
>>>   * @bond: bonding device that is transmitting
>>> - * @skb: buffer to transmit
>>>   * @slave_id: slave id up to slave_cnt-1 through which to transmit
>>>   *
>>> - * This function tries to transmit through slave with slave_id but in case
>>> + * This function tries to get slave with slave_id but in case
>>>   * it fails, it tries to find the first available slave for transmission.
>>> - * The skb is consumed in all cases, thus the function is void.
>>>   */
>>> -static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int slave_id)
>>> +static struct slave *bond_get_slave_by_id(struct bonding *bond,
>>> +					  int slave_id)
>>> {
>>> 	struct list_head *iter;
>>> 	struct slave *slave;
>>> @@ -3941,10 +3957,8 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
>>> 	/* Here we start from the slave with slave_id */
>>> 	bond_for_each_slave_rcu(bond, slave, iter) {
>>> 		if (--i < 0) {
>>> -			if (bond_slave_can_tx(slave)) {
>>> -				bond_dev_queue_xmit(bond, skb, slave->dev);
>>> -				return;
>>> -			}
>>> +			if (bond_slave_can_tx(slave))
>>> +				return slave;
>>> 		}
>>> 	}
>>>
>>> @@ -3953,13 +3967,11 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
>>> 	bond_for_each_slave_rcu(bond, slave, iter) {
>>> 		if (--i < 0)
>>> 			break;
>>> -		if (bond_slave_can_tx(slave)) {
>>> -			bond_dev_queue_xmit(bond, skb, slave->dev);
>>> -			return;
>>> -		}
>>> +		if (bond_slave_can_tx(slave))
>>> +			return slave;
>>> 	}
>>> -	/* no slave that can tx has been found */
>>> -	bond_tx_drop(bond->dev, skb);
>>> +
>>> +	return NULL;
>>> }
>>>
>>> /**
>>> @@ -3995,10 +4007,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
>>> 	return slave_id;
>>> }
>>>
>>> -static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>>> -					struct net_device *bond_dev)
>>> +static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
>>> +						    struct sk_buff *skb)
>>> {
>>> -	struct bonding *bond = netdev_priv(bond_dev);
>>> 	struct slave *slave;
>>> 	int slave_cnt;
>>> 	u32 slave_id;
>>> @@ -4020,24 +4031,40 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>>> 		if (iph->protocol == IPPROTO_IGMP) {
>>> 			slave = rcu_dereference(bond->curr_active_slave);
>>> 			if (slave)
>>> -				bond_dev_queue_xmit(bond, skb, slave->dev);
>>> -			else
>>> -				bond_xmit_slave_id(bond, skb, 0);
>>> -			return NETDEV_TX_OK;
>>> +				return slave;
>>> +			return bond_get_slave_by_id(bond, 0);
>>> 		}
>>> 	}
>>>
>>> non_igmp:
>>> 	slave_cnt = READ_ONCE(bond->slave_cnt);
>>> 	if (likely(slave_cnt)) {
>>> -		slave_id = bond_rr_gen_slave_id(bond);
>>> -		bond_xmit_slave_id(bond, skb, slave_id % slave_cnt);
>>> -	} else {
>>> -		bond_tx_drop(bond_dev, skb);
>>> +		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
>>> +		return bond_get_slave_by_id(bond, slave_id);
>>> 	}
>>> +	return NULL;
>>> +}
>>> +
>>> +static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>>> +					struct net_device *bond_dev)
>>> +{
>>> +	struct bonding *bond = netdev_priv(bond_dev);
>>> +	struct slave *slave;
>>> +
>>> +	slave = bond_xmit_roundrobin_slave_get(bond, skb);
>>> +	if (slave)
>>> +		bond_dev_queue_xmit(bond, skb, slave->dev);
>>> +	else
>>> +		bond_tx_drop(bond_dev, skb);
>>> 	return NETDEV_TX_OK;
>>> }
>>>
>>> +static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
>>> +						      struct sk_buff *skb)
>>> +{
>>> +	return rcu_dereference(bond->curr_active_slave);
>>> +}
>>> +
>>> /* In active-backup mode, we know that bond->curr_active_slave is always valid if
>>>   * the bond has a usable interface.
>>>   */
>>> @@ -4047,7 +4074,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
>>> 	struct bonding *bond = netdev_priv(bond_dev);
>>> 	struct slave *slave;
>>>
>>> -	slave = rcu_dereference(bond->curr_active_slave);
>>> +	slave = bond_xmit_activebackup_slave_get(bond, skb);
>>> 	if (slave)
>>> 		bond_dev_queue_xmit(bond, skb, slave->dev);
>>> 	else
>>> @@ -4193,18 +4220,15 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
>>> 				     struct net_device *dev)
>>> {
>>> 	struct bonding *bond = netdev_priv(dev);
>>> -	struct slave *slave;
>>> 	struct bond_up_slave *slaves;
>>> -	unsigned int count;
>>> +	struct slave *slave;
>>>
>>> 	slaves = rcu_dereference(bond->usable_slaves);
>>> -	count = slaves ? READ_ONCE(slaves->count) : 0;
>>> -	if (likely(count)) {
>>> -		slave = slaves->arr[bond_xmit_hash(bond, skb) % count];
>>> +	slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves);
>>> +	if (likely(slave))
>>> 		bond_dev_queue_xmit(bond, skb, slave->dev);
>>> -	} else {
>>> +	else
>>> 		bond_tx_drop(dev, skb);
>>> -	}
>>>
>>> 	return NETDEV_TX_OK;
>>> }
>>> diff --git a/include/net/bond_alb.h b/include/net/bond_alb.h
>>> index b3504fcd773d..f6af76c87a6c 100644
>>> --- a/include/net/bond_alb.h
>>> +++ b/include/net/bond_alb.h
>>> @@ -158,6 +158,10 @@ void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char
>>> void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave);
>>> int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
>>> int bond_tlb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
>>> +struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
>>> +				      struct sk_buff *skb);
>>> +struct slave *bond_xmit_tlb_slave_get(struct bonding *bond,
>>> +				      struct sk_buff *skb);
>>> void bond_alb_monitor(struct work_struct *);
>>> int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr);
>>> void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id);
>>> -- 
>>> 2.17.2
>>>
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 18:56                 ` Maor Gottlieb
@ 2020-04-20 19:02                   ` David Ahern
  2020-04-21  5:37                     ` Jiri Pirko
  0 siblings, 1 reply; 31+ messages in thread
From: David Ahern @ 2020-04-20 19:02 UTC (permalink / raw)
  To: Maor Gottlieb, Jiri Pirko
  Cc: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr

On 4/20/20 12:56 PM, Maor Gottlieb wrote:
> 
> On 4/20/2020 9:48 PM, Jiri Pirko wrote:
>> Mon, Apr 20, 2020 at 08:04:01PM CEST, dsahern@gmail.com wrote:
>>> On 4/20/20 12:01 PM, Jiri Pirko wrote:
>>>> Generic ndo with lag-specific arg? Odd. Plus, there is a small chance
>>>> this is ever going to be used for other master. And if so, could be
>>>> very
>>>> easily renamed then...
>>> core code should be generic, not specific and renamed at a later date
>>> when a second use case arises.
>> Yeah, I guess we just have to agree to disagree :)
> 
> So I am remaining with the flags. Any suggestion for better name for the
> enum? Should I move master_xmit_get_slave from lag.h to netdevice.h?
>>

IMHO, yes, that is a better place.

generic ndo name and implementation.
type specific flag as needed.

This is consistent with net_device and ndo - both generic concepts -
with specifics relegated to flags (e.g., IFF_*)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-20 19:02                   ` David Ahern
@ 2020-04-21  5:37                     ` Jiri Pirko
  2020-04-21  5:43                       ` Maor Gottlieb
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2020-04-21  5:37 UTC (permalink / raw)
  To: David Ahern
  Cc: Maor Gottlieb, davem, jgg, dledford, j.vosburgh, vfalico, andy,
	kuba, leonro, saeedm, jiri, linux-rdma, netdev, alexr

Mon, Apr 20, 2020 at 09:02:58PM CEST, dsahern@gmail.com wrote:
>On 4/20/20 12:56 PM, Maor Gottlieb wrote:
>> 
>> On 4/20/2020 9:48 PM, Jiri Pirko wrote:
>>> Mon, Apr 20, 2020 at 08:04:01PM CEST, dsahern@gmail.com wrote:
>>>> On 4/20/20 12:01 PM, Jiri Pirko wrote:
>>>>> Generic ndo with lag-specific arg? Odd. Plus, there is a small chance
>>>>> this is ever going to be used for other master. And if so, could be
>>>>> very
>>>>> easily renamed then...
>>>> core code should be generic, not specific and renamed at a later date
>>>> when a second use case arises.
>>> Yeah, I guess we just have to agree to disagree :)
>> 
>> So I am remaining with the flags. Any suggestion for better name for the
>> enum? Should I move master_xmit_get_slave from lag.h to netdevice.h?
>>>
>
>IMHO, yes, that is a better place.
>
>generic ndo name and implementation.
>type specific flag as needed.
>
>This is consistent with net_device and ndo - both generic concepts -
>with specifics relegated to flags (e.g., IFF_*)

Why there is need for flags? Why a single bool can't do as I suggested?
Do you see any usecase for another flag?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get
  2020-04-21  5:37                     ` Jiri Pirko
@ 2020-04-21  5:43                       ` Maor Gottlieb
  0 siblings, 0 replies; 31+ messages in thread
From: Maor Gottlieb @ 2020-04-21  5:43 UTC (permalink / raw)
  To: Jiri Pirko, David Ahern
  Cc: davem, jgg, dledford, j.vosburgh, vfalico, andy, kuba, leonro,
	saeedm, jiri, linux-rdma, netdev, alexr


On 4/21/2020 8:37 AM, Jiri Pirko wrote:
> Mon, Apr 20, 2020 at 09:02:58PM CEST, dsahern@gmail.com wrote:
>> On 4/20/20 12:56 PM, Maor Gottlieb wrote:
>>> On 4/20/2020 9:48 PM, Jiri Pirko wrote:
>>>> Mon, Apr 20, 2020 at 08:04:01PM CEST, dsahern@gmail.com wrote:
>>>>> On 4/20/20 12:01 PM, Jiri Pirko wrote:
>>>>>> Generic ndo with lag-specific arg? Odd. Plus, there is a small chance
>>>>>> this is ever going to be used for other master. And if so, could be
>>>>>> very
>>>>>> easily renamed then...
>>>>> core code should be generic, not specific and renamed at a later date
>>>>> when a second use case arises.
>>>> Yeah, I guess we just have to agree to disagree :)
>>> So I am remaining with the flags. Any suggestion for better name for the
>>> enum? Should I move master_xmit_get_slave from lag.h to netdevice.h?
>> IMHO, yes, that is a better place.
>>
>> generic ndo name and implementation.
>> type specific flag as needed.
>>
>> This is consistent with net_device and ndo - both generic concepts -
>> with specifics relegated to flags (e.g., IFF_*)
> Why there is need for flags? Why a single bool can't do as I suggested?
> Do you see any usecase for another flag?

Currently no. I am okay with single bool.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-04-21  5:43 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-20  7:54 [PATCH V2 mlx5-next 00/10] Add support to get xmit slave Maor Gottlieb
2020-04-20  7:54 ` [PATCH V2 mlx5-next 01/10] net/core: Introduce master_xmit_slave_get Maor Gottlieb
2020-04-20 14:01   ` Jiri Pirko
2020-04-20 17:29     ` David Ahern
2020-04-20 17:41       ` Jiri Pirko
2020-04-20 17:43         ` David Ahern
2020-04-20 17:53           ` Jiri Pirko
2020-04-20 17:54       ` Jiri Pirko
2020-04-20 17:56         ` David Ahern
2020-04-20 18:01           ` Jiri Pirko
2020-04-20 18:04             ` David Ahern
2020-04-20 18:48               ` Jiri Pirko
2020-04-20 18:56                 ` Maor Gottlieb
2020-04-20 19:02                   ` David Ahern
2020-04-21  5:37                     ` Jiri Pirko
2020-04-21  5:43                       ` Maor Gottlieb
2020-04-20  7:54 ` [PATCH V2 mlx5-next 02/10] bonding: Rename slave_arr to usable_slaves Maor Gottlieb
2020-04-20 14:17   ` Jiri Pirko
2020-04-20  7:54 ` [PATCH V2 mlx5-next 03/10] bonding: Add helpers to get xmit slave Maor Gottlieb
2020-04-20 14:27   ` Jiri Pirko
2020-04-20 18:46     ` Jay Vosburgh
2020-04-20 18:59       ` Maor Gottlieb
2020-04-20  7:54 ` [PATCH V2 mlx5-next 04/10] bonding: Implement ndo_xmit_slave_get Maor Gottlieb
2020-04-20 15:04   ` Jiri Pirko
2020-04-20 15:36   ` David Ahern
2020-04-20  7:54 ` [PATCH V2 mlx5-next 05/10] RDMA/core: Add LAG functionality Maor Gottlieb
2020-04-20  7:54 ` [PATCH V2 mlx5-next 06/10] RDMA/core: Get xmit slave for LAG Maor Gottlieb
2020-04-20  7:54 ` [PATCH V2 mlx5-next 07/10] net/mlx5: Change lag mutex lock to spin lock Maor Gottlieb
2020-04-20  7:54 ` [PATCH V2 mlx5-next 08/10] net/mlx5: Add support to get lag physical port Maor Gottlieb
2020-04-20  7:54 ` [PATCH V2 mlx5-next 09/10] RDMA/mlx5: Refactor affinity related code Maor Gottlieb
2020-04-20  7:54 ` [PATCH V2 mlx5-next 10/10] RDMA/mlx5: Set lag tx affinity according to slave Maor Gottlieb

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.