All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] net/mlx4: Add flow-steering support
@ 2012-07-01  9:43 Or Gerlitz
  2012-07-01  9:43 ` [PATCH net-next 01/10] net/mlx4_core: Change resource tracking mechanism to use red-black tree Or Gerlitz
                   ` (9 more replies)
  0 siblings, 10 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Or Gerlitz, Hadar Hen Zion

Hi Dave, 

This patch series from Hadar adds code to manage L2/L3/L4 network 
flow steering rules, a feature which is supported by the ConnectX-3 device.

The series is built as follows:

The first two patches deal with SRIOV resource tracker, whose mechanism 
is changed to use red-black tree instead of radix tree. The reason for 
this change is that the coming steering patches use flow IDs which are 64 
bits in size, where radix tree keys can't be 64bit on 32bit architecture, 
while RB tree can do that.

Patch #3 is little re-design of the Ethernet driver multicast attachments 
flow to be more efficient and robust.

The fourth patch does a re-org of the checks that deal with the current 
"older" steering modes such that we can easily add soon the new steering 
mode and the code remains easy to manage.

Patch #5 adds the firmware commands for the new steering mode, which is 
called "device managed flow steeering".

Patch 6 is the main patch of this series. It adds support for device-managed flow 
steering all across the place. We had to have this patch also to touch the mlx4 
IB driver, since the steering mode is global to the HCA -- so when being enabled, 
multicast attachment calls done by the IB driver into the mlx4 core driver, 
are now routed to the flow steering firmware commands whose API is a bit different, 
something that the IB driver had to be aware to. Following that, the 7th patch 
adds resource tracking for device-managed flow steering rules.

The 8th patch adds promiscuous mode support under device-managed flow steering,
next, the 9th patch adds implementation for the ethtool APIs for attaching 
L2/L3/L4 based flow steering rules, and the last patch adds support for drop 
action through ethtool.

Or.

Hadar Hen Zion (9):
  net/mlx4_core: Change resource tracking mechanism to use red-black tree
  net/mlx4_core: Change resource tracking ID to be 64 bit
  net/mlx4: Set steering mode according to device capabilities
  net/mlx4_core: Add firmware commands to support device managed flow steering
  {NET,IB}/mlx4: Add device managed flow steering firmware API
  net/mlx4_core: Add resource tracking for device managed flow steering rules
  net/mlx4: Implement promiscuous mode with device managed flow-steering
  net/mlx4_en: Manage flow steering rules with ethtool
  net/mlx4_en: Add support for drop action through ethtool

Yevgeny Petrilin (1):
  net/mlx4_en: Re-design multicast attachments flow

 drivers/infiniband/hw/mlx4/main.c                  |   62 ++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |    1 +
 drivers/infiniband/hw/mlx4/qp.c                    |    1 +
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |   19 +
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c    |  373 ++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |  313 +++++++++---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |   30 ++
 drivers/net/ethernet/mellanox/mlx4/fw.c            |   90 +++-
 drivers/net/ethernet/mellanox/mlx4/fw.h            |    3 +
 drivers/net/ethernet/mellanox/mlx4/main.c          |   60 ++-
 drivers/net/ethernet/mellanox/mlx4/mcg.c           |  524 ++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |   29 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |   28 +-
 drivers/net/ethernet/mellanox/mlx4/port.c          |  111 +++--
 drivers/net/ethernet/mellanox/mlx4/profile.c       |   12 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  284 +++++++++--
 include/linux/mlx4/cmd.h                           |    4 +
 include/linux/mlx4/device.h                        |  136 +++++-
 18 files changed, 1848 insertions(+), 232 deletions(-)

-- 
1.7.1

Cc: Hadar Hen Zion <hadarh@mellanox.co.il>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH net-next 01/10] net/mlx4_core: Change resource tracking mechanism to use red-black tree
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01 10:17   ` David Miller
  2012-07-01  9:43 ` [PATCH net-next 02/10] net/mlx4_core: Change resource tracking ID to be 64 bit Or Gerlitz
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

Change the data structure used for managing the SRIOV resource tracking
mechanism from radix tree to red-black tree. This is preparation step
for supporting resource IDs which are 64bit long, such as network flow
steering rules. Such IDs can't be used as radix-tree keys on 32bit
architectures and hence the reason for the change.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    3 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  112 ++++++++++++++------
 2 files changed, 81 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index e5d2022..1a2f372 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -39,6 +39,7 @@
 
 #include <linux/mutex.h>
 #include <linux/radix-tree.h>
+#include <linux/rbtree.h>
 #include <linux/timer.h>
 #include <linux/semaphore.h>
 #include <linux/workqueue.h>
@@ -509,7 +510,7 @@ struct slave_list {
 struct mlx4_resource_tracker {
 	spinlock_t lock;
 	/* tree for each resources */
-	struct radix_tree_root res_tree[MLX4_NUM_OF_RESOURCE_TYPE];
+	struct rb_root res_tree[MLX4_NUM_OF_RESOURCE_TYPE];
 	/* num_of_slave's lists, one per slave */
 	struct slave_list *slave_list;
 };
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 766b8c5..6f89d44 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -57,6 +57,7 @@ struct mac_res {
 
 struct res_common {
 	struct list_head	list;
+	struct rb_node		node;
 	u32		        res_id;
 	int			owner;
 	int			state;
@@ -189,6 +190,49 @@ struct res_xrcdn {
 	int			port;
 };
 
+static struct res_common *res_tracker_lookup(struct rb_root *root, u64 res_id)
+{
+	struct rb_node *node = root->rb_node;
+
+	while (node) {
+		struct res_common *res = container_of(node, struct res_common,
+						      node);
+
+		if (res_id < res->res_id)
+			node = node->rb_left;
+		else if (res_id > res->res_id)
+			node = node->rb_right;
+		else
+			return res;
+	}
+	return NULL;
+}
+
+static int res_tracker_insert(struct rb_root *root, struct res_common *res)
+{
+	struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct res_common *this = container_of(*new, struct res_common,
+						       node);
+
+		parent = *new;
+		if (res->res_id < this->res_id)
+			new = &((*new)->rb_left);
+		else if (res->res_id > this->res_id)
+			new = &((*new)->rb_right);
+		else
+			return -EEXIST;
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&res->node, parent, new);
+	rb_insert_color(&res->node, root);
+
+	return 0;
+}
+
 /* For Debug uses */
 static const char *ResourceType(enum mlx4_resource rt)
 {
@@ -228,8 +272,7 @@ int mlx4_init_resource_tracker(struct mlx4_dev *dev)
 	mlx4_dbg(dev, "Started init_resource_tracker: %ld slaves\n",
 		 dev->num_slaves);
 	for (i = 0 ; i < MLX4_NUM_OF_RESOURCE_TYPE; i++)
-		INIT_RADIX_TREE(&priv->mfunc.master.res_tracker.res_tree[i],
-				GFP_ATOMIC|__GFP_NOWARN);
+		priv->mfunc.master.res_tracker.res_tree[i] = RB_ROOT;
 
 	spin_lock_init(&priv->mfunc.master.res_tracker.lock);
 	return 0 ;
@@ -272,13 +315,13 @@ static int mpt_mask(struct mlx4_dev *dev)
 	return dev->caps.num_mpts - 1;
 }
 
-static void *find_res(struct mlx4_dev *dev, int res_id,
-		      enum mlx4_resource type)
+static struct res_common *find_res(struct mlx4_dev *dev, int res_id,
+				   enum mlx4_resource type)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 
-	return radix_tree_lookup(&priv->mfunc.master.res_tracker.res_tree[type],
-				 res_id);
+	return res_tracker_lookup(&priv->mfunc.master.res_tracker.res_tree[type],
+				  res_id);
 }
 
 static int get_res(struct mlx4_dev *dev, int slave, int res_id,
@@ -523,7 +566,7 @@ static int add_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct res_common **res_arr;
 	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
-	struct radix_tree_root *root = &tracker->res_tree[type];
+	struct rb_root *root = &tracker->res_tree[type];
 
 	res_arr = kzalloc(count * sizeof *res_arr, GFP_KERNEL);
 	if (!res_arr)
@@ -546,7 +589,7 @@ static int add_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 			err = -EEXIST;
 			goto undo;
 		}
-		err = radix_tree_insert(root, base + i, res_arr[i]);
+		err = res_tracker_insert(root, res_arr[i]);
 		if (err)
 			goto undo;
 		list_add_tail(&res_arr[i]->list,
@@ -559,7 +602,7 @@ static int add_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 
 undo:
 	for (--i; i >= base; --i)
-		radix_tree_delete(&tracker->res_tree[type], i);
+		rb_erase(&res_arr[i]->node, root);
 
 	spin_unlock_irq(mlx4_tlock(dev));
 
@@ -695,7 +738,7 @@ static int rem_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 
 	spin_lock_irq(mlx4_tlock(dev));
 	for (i = base; i < base + count; ++i) {
-		r = radix_tree_lookup(&tracker->res_tree[type], i);
+		r = res_tracker_lookup(&tracker->res_tree[type], i);
 		if (!r) {
 			err = -ENOENT;
 			goto out;
@@ -710,8 +753,8 @@ static int rem_res_range(struct mlx4_dev *dev, int slave, int base, int count,
 	}
 
 	for (i = base; i < base + count; ++i) {
-		r = radix_tree_lookup(&tracker->res_tree[type], i);
-		radix_tree_delete(&tracker->res_tree[type], i);
+		r = res_tracker_lookup(&tracker->res_tree[type], i);
+		rb_erase(&r->node, &tracker->res_tree[type]);
 		list_del(&r->list);
 		kfree(r);
 	}
@@ -733,7 +776,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
 	int err = 0;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_QP], qpn);
+	r = (struct res_qp *)res_tracker_lookup(&tracker->res_tree[RES_QP], qpn);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -797,7 +840,8 @@ static int mr_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
 	int err = 0;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_MPT], index);
+	r = (struct res_mpt *)res_tracker_lookup(&tracker->res_tree[RES_MPT],
+					     index);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -850,7 +894,7 @@ static int eq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
 	int err = 0;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_EQ], index);
+	r = (struct res_eq *)res_tracker_lookup(&tracker->res_tree[RES_EQ], index);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -898,7 +942,7 @@ static int cq_res_start_move_to(struct mlx4_dev *dev, int slave, int cqn,
 	int err;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_CQ], cqn);
+	r = (struct res_cq *)res_tracker_lookup(&tracker->res_tree[RES_CQ], cqn);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -952,7 +996,8 @@ static int srq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
 	int err = 0;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[RES_SRQ], index);
+	r = (struct res_srq *)res_tracker_lookup(&tracker->res_tree[RES_SRQ],
+					     index);
 	if (!r)
 		err = -ENOENT;
 	else if (r->com.owner != slave)
@@ -1001,7 +1046,7 @@ static void res_abort_move(struct mlx4_dev *dev, int slave,
 	struct res_common *r;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[type], id);
+	r = res_tracker_lookup(&tracker->res_tree[type], id);
 	if (r && (r->owner == slave))
 		r->state = r->from_state;
 	spin_unlock_irq(mlx4_tlock(dev));
@@ -1015,7 +1060,7 @@ static void res_end_move(struct mlx4_dev *dev, int slave,
 	struct res_common *r;
 
 	spin_lock_irq(mlx4_tlock(dev));
-	r = radix_tree_lookup(&tracker->res_tree[type], id);
+	r = res_tracker_lookup(&tracker->res_tree[type], id);
 	if (r && (r->owner == slave))
 		r->state = r->to_state;
 	spin_unlock_irq(mlx4_tlock(dev));
@@ -2817,8 +2862,8 @@ static void rem_slave_qps(struct mlx4_dev *dev, int slave)
 				switch (state) {
 				case RES_QP_RESERVED:
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_QP],
-							  qp->com.res_id);
+					rb_erase(&qp->com.node,
+						 &tracker->res_tree[RES_QP]);
 					list_del(&qp->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(qp);
@@ -2888,8 +2933,8 @@ static void rem_slave_srqs(struct mlx4_dev *dev, int slave)
 				case RES_SRQ_ALLOCATED:
 					__mlx4_srq_free_icm(dev, srqn);
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_SRQ],
-							  srqn);
+					rb_erase(&srq->com.node,
+						 &tracker->res_tree[RES_SRQ]);
 					list_del(&srq->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(srq);
@@ -2954,8 +2999,8 @@ static void rem_slave_cqs(struct mlx4_dev *dev, int slave)
 				case RES_CQ_ALLOCATED:
 					__mlx4_cq_free_icm(dev, cqn);
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_CQ],
-							  cqn);
+					rb_erase(&cq->com.node,
+						 &tracker->res_tree[RES_CQ]);
 					list_del(&cq->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(cq);
@@ -3017,8 +3062,8 @@ static void rem_slave_mrs(struct mlx4_dev *dev, int slave)
 				case RES_MPT_RESERVED:
 					__mlx4_mr_release(dev, mpt->key);
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_MPT],
-							  mptn);
+					rb_erase(&mpt->com.node,
+						 &tracker->res_tree[RES_MPT]);
 					list_del(&mpt->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(mpt);
@@ -3086,8 +3131,8 @@ static void rem_slave_mtts(struct mlx4_dev *dev, int slave)
 					__mlx4_free_mtt_range(dev, base,
 							      mtt->order);
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_MTT],
-							  base);
+					rb_erase(&mtt->com.node,
+						 &tracker->res_tree[RES_MTT]);
 					list_del(&mtt->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(mtt);
@@ -3133,8 +3178,8 @@ static void rem_slave_eqs(struct mlx4_dev *dev, int slave)
 				switch (state) {
 				case RES_EQ_RESERVED:
 					spin_lock_irq(mlx4_tlock(dev));
-					radix_tree_delete(&tracker->res_tree[RES_EQ],
-							  eqn);
+					rb_erase(&eq->com.node,
+						 &tracker->res_tree[RES_EQ]);
 					list_del(&eq->com.list);
 					spin_unlock_irq(mlx4_tlock(dev));
 					kfree(eq);
@@ -3191,7 +3236,8 @@ static void rem_slave_counters(struct mlx4_dev *dev, int slave)
 	list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
 		if (counter->com.owner == slave) {
 			index = counter->com.res_id;
-			radix_tree_delete(&tracker->res_tree[RES_COUNTER], index);
+			rb_erase(&counter->com.node,
+				 &tracker->res_tree[RES_COUNTER]);
 			list_del(&counter->com.list);
 			kfree(counter);
 			__mlx4_counter_free(dev, index);
@@ -3220,7 +3266,7 @@ static void rem_slave_xrcdns(struct mlx4_dev *dev, int slave)
 	list_for_each_entry_safe(xrcd, tmp, xrcdn_list, com.list) {
 		if (xrcd->com.owner == slave) {
 			xrcdn = xrcd->com.res_id;
-			radix_tree_delete(&tracker->res_tree[RES_XRCD], xrcdn);
+			rb_erase(&xrcd->com.node, &tracker->res_tree[RES_XRCD]);
 			list_del(&xrcd->com.list);
 			kfree(xrcd);
 			__mlx4_xrcd_free(dev, xrcdn);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 02/10] net/mlx4_core: Change resource tracking ID to be 64 bit
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
  2012-07-01  9:43 ` [PATCH net-next 01/10] net/mlx4_core: Change resource tracking mechanism to use red-black tree Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01  9:43 ` [PATCH net-next 03/10] net/mlx4_en: Re-design multicast attachments flow Or Gerlitz
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

Currently the IDs used by the resource tracker are of type u32, so far this was
ok since all the different resources we were tracking could be encoded in 32bit.

As a preparation step for tracking of resources whose IDs need > 32 bits such
as network flow steering rules, who are 64 bit in size, move to use 64 bit
based resource IDs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    2 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   28 ++++++++++----------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 1a2f372..a425a98 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -1033,7 +1033,7 @@ int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port);
 /* resource tracker functions*/
 int mlx4_get_slave_from_resource_id(struct mlx4_dev *dev,
 				    enum mlx4_resource resource_type,
-				    int resource_id, int *slave);
+				    u64 resource_id, int *slave);
 void mlx4_delete_all_resources_for_slave(struct mlx4_dev *dev, int slave_id);
 int mlx4_init_resource_tracker(struct mlx4_dev *dev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 6f89d44..b8e8969 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -58,7 +58,7 @@ struct mac_res {
 struct res_common {
 	struct list_head	list;
 	struct rb_node		node;
-	u32		        res_id;
+	u64		        res_id;
 	int			owner;
 	int			state;
 	int			from_state;
@@ -315,7 +315,7 @@ static int mpt_mask(struct mlx4_dev *dev)
 	return dev->caps.num_mpts - 1;
 }
 
-static struct res_common *find_res(struct mlx4_dev *dev, int res_id,
+static struct res_common *find_res(struct mlx4_dev *dev, u64 res_id,
 				   enum mlx4_resource type)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -324,7 +324,7 @@ static struct res_common *find_res(struct mlx4_dev *dev, int res_id,
 				  res_id);
 }
 
-static int get_res(struct mlx4_dev *dev, int slave, int res_id,
+static int get_res(struct mlx4_dev *dev, int slave, u64 res_id,
 		   enum mlx4_resource type,
 		   void *res)
 {
@@ -350,7 +350,7 @@ static int get_res(struct mlx4_dev *dev, int slave, int res_id,
 
 	r->from_state = r->state;
 	r->state = RES_ANY_BUSY;
-	mlx4_dbg(dev, "res %s id 0x%x to busy\n",
+	mlx4_dbg(dev, "res %s id 0x%llx to busy\n",
 		 ResourceType(type), r->res_id);
 
 	if (res)
@@ -363,7 +363,7 @@ exit:
 
 int mlx4_get_slave_from_resource_id(struct mlx4_dev *dev,
 				    enum mlx4_resource type,
-				    int res_id, int *slave)
+				    u64 res_id, int *slave)
 {
 
 	struct res_common *r;
@@ -384,7 +384,7 @@ int mlx4_get_slave_from_resource_id(struct mlx4_dev *dev,
 	return err;
 }
 
-static void put_res(struct mlx4_dev *dev, int slave, int res_id,
+static void put_res(struct mlx4_dev *dev, int slave, u64 res_id,
 		    enum mlx4_resource type)
 {
 	struct res_common *r;
@@ -516,7 +516,7 @@ static struct res_common *alloc_xrcdn_tr(int id)
 	return &ret->com;
 }
 
-static struct res_common *alloc_tr(int id, enum mlx4_resource type, int slave,
+static struct res_common *alloc_tr(u64 id, enum mlx4_resource type, int slave,
 				   int extra)
 {
 	struct res_common *ret;
@@ -558,7 +558,7 @@ static struct res_common *alloc_tr(int id, enum mlx4_resource type, int slave,
 	return ret;
 }
 
-static int add_res_range(struct mlx4_dev *dev, int slave, int base, int count,
+static int add_res_range(struct mlx4_dev *dev, int slave, u64 base, int count,
 			 enum mlx4_resource type, int extra)
 {
 	int i;
@@ -727,10 +727,10 @@ static int remove_ok(struct res_common *res, enum mlx4_resource type, int extra)
 	}
 }
 
-static int rem_res_range(struct mlx4_dev *dev, int slave, int base, int count,
+static int rem_res_range(struct mlx4_dev *dev, int slave, u64 base, int count,
 			 enum mlx4_resource type, int extra)
 {
-	int i;
+	u64 i;
 	int err;
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_resource_tracker *tracker = &priv->mfunc.master.res_tracker;
@@ -784,7 +784,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
 	else {
 		switch (state) {
 		case RES_QP_BUSY:
-			mlx4_dbg(dev, "%s: failed RES_QP, 0x%x\n",
+			mlx4_dbg(dev, "%s: failed RES_QP, 0x%llx\n",
 				 __func__, r->com.res_id);
 			err = -EBUSY;
 			break;
@@ -793,7 +793,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
 			if (r->com.state == RES_QP_MAPPED && !alloc)
 				break;
 
-			mlx4_dbg(dev, "failed RES_QP, 0x%x\n", r->com.res_id);
+			mlx4_dbg(dev, "failed RES_QP, 0x%llx\n", r->com.res_id);
 			err = -EINVAL;
 			break;
 
@@ -802,7 +802,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
 			    r->com.state == RES_QP_HW)
 				break;
 			else {
-				mlx4_dbg(dev, "failed RES_QP, 0x%x\n",
+				mlx4_dbg(dev, "failed RES_QP, 0x%llx\n",
 					  r->com.res_id);
 				err = -EINVAL;
 			}
@@ -2796,7 +2796,7 @@ static int _move_all_busy(struct mlx4_dev *dev, int slave,
 				if (r->state == RES_ANY_BUSY) {
 					if (print)
 						mlx4_dbg(dev,
-							 "%s id 0x%x is busy\n",
+							 "%s id 0x%llx is busy\n",
 							  ResourceType(type),
 							  r->res_id);
 					++busy;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 03/10] net/mlx4_en: Re-design multicast attachments flow
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
  2012-07-01  9:43 ` [PATCH net-next 01/10] net/mlx4_core: Change resource tracking mechanism to use red-black tree Or Gerlitz
  2012-07-01  9:43 ` [PATCH net-next 02/10] net/mlx4_core: Change resource tracking ID to be 64 bit Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01 10:32   ` David Miller
  2012-07-01  9:43 ` [PATCH net-next 04/10] net/mlx4: Set steering mode according to device capabilities Or Gerlitz
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem
  Cc: roland, yevgenyp, oren, netdev, Yevgeny Petrilin, Shawn Bohrer,
	Hadar Hen Zion

From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>

Currently, for every change in the net device multicast list, the driver
detaches all the addresses from the HW device, and then attaches the
updated list. This behavior is wrong from two aspects: first, it causes
a load of firmware commands and second, there is period of time where
the correct addresses are not attached, which turned into packet loss.

To improve - a copy of the multicast list is saved by the driver. For
every change in the multicast list, the multicast list copy is used
to find the delta between those two lists and add or remove multicast
addresses as needed. 

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  141 ++++++++++++++++++------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |   16 +++-
 2 files changed, 122 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 073b85b..35c64f2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -170,33 +170,79 @@ static void mlx4_en_do_set_mac(struct work_struct *work)
 static void mlx4_en_clear_list(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_mc_list *tmp, *mc_to_del;
 
-	kfree(priv->mc_addrs);
-	priv->mc_addrs = NULL;
-	priv->mc_addrs_cnt = 0;
+	list_for_each_entry_safe(mc_to_del, tmp, &priv->mc_list, list) {
+		list_del(&mc_to_del->list);
+		kfree(mc_to_del);
+	}
 }
 
 static void mlx4_en_cache_mclist(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct netdev_hw_addr *ha;
-	char *mc_addrs;
-	int mc_addrs_cnt = netdev_mc_count(dev);
-	int i;
+	struct mlx4_en_mc_list *tmp;
 
-	mc_addrs = kmalloc(mc_addrs_cnt * ETH_ALEN, GFP_ATOMIC);
-	if (!mc_addrs) {
-		en_err(priv, "failed to allocate multicast list\n");
-		return;
-	}
-	i = 0;
-	netdev_for_each_mc_addr(ha, dev)
-		memcpy(mc_addrs + i++ * ETH_ALEN, ha->addr, ETH_ALEN);
 	mlx4_en_clear_list(dev);
-	priv->mc_addrs = mc_addrs;
-	priv->mc_addrs_cnt = mc_addrs_cnt;
+	netdev_for_each_mc_addr(ha, dev) {
+		tmp = kzalloc(sizeof(struct mlx4_en_mc_list), GFP_ATOMIC);
+		if (!tmp) {
+			en_err(priv, "failed to allocate multicast list\n");
+			mlx4_en_clear_list(dev);
+			return;
+		}
+		memcpy(tmp->addr, ha->addr, ETH_ALEN);
+		list_add_tail(&tmp->list, &priv->mc_list);
+	}
 }
 
+static void update_mclist_flags(struct mlx4_en_priv *priv,
+				struct list_head *dst,
+				struct list_head *src)
+{
+	struct mlx4_en_mc_list *dst_tmp, *src_tmp, *new_mc;
+	bool found;
+
+	/* Find all the entries that should be removed from dst,
+	 * These are the entries that are not found in src */
+	list_for_each_entry(dst_tmp, dst, list) {
+		found = false;
+		list_for_each_entry(src_tmp, src, list) {
+			if (!memcmp(dst_tmp->addr, src_tmp->addr, ETH_ALEN)) {
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			dst_tmp->action = MCLIST_REM;
+	}
+
+	/* Add entries that exist in src but not in dst
+	 * mark them as need to add */
+	list_for_each_entry(src_tmp, src, list) {
+		found = false;
+		list_for_each_entry(dst_tmp, dst, list) {
+			if (!memcmp(dst_tmp->addr, src_tmp->addr, ETH_ALEN)) {
+				dst_tmp->action = MCLIST_NONE;
+				found = true;
+				break;
+			}
+		}
+		if (!found) {
+			new_mc = kmalloc(sizeof(struct mlx4_en_mc_list),
+					     GFP_KERNEL);
+			if (!new_mc) {
+				en_err(priv, "Failed to allocate current multicast list\n");
+				return;
+			}
+			memcpy(new_mc, src_tmp,
+			       sizeof(struct mlx4_en_mc_list));
+			new_mc->action = MCLIST_ADD;
+			list_add_tail(&new_mc->list, dst);
+		}
+	}
+}
 
 static void mlx4_en_set_multicast(struct net_device *dev)
 {
@@ -214,6 +260,7 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 						 mcast_task);
 	struct mlx4_en_dev *mdev = priv->mdev;
 	struct net_device *dev = priv->dev;
+	struct mlx4_en_mc_list *mclist, *tmp;
 	u64 mcast_addr = 0;
 	u8 mc_list[16] = {0};
 	int err;
@@ -336,7 +383,6 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 			priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
 		}
 	} else {
-		int i;
 		/* Disable Multicast promisc */
 		if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
 			err = mlx4_multicast_promisc_remove(mdev->dev, priv->base_qpn,
@@ -351,13 +397,6 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 		if (err)
 			en_err(priv, "Failed disabling multicast filter\n");
 
-		/* Detach our qp from all the multicast addresses */
-		for (i = 0; i < priv->mc_addrs_cnt; i++) {
-			memcpy(&mc_list[10], priv->mc_addrs + i * ETH_ALEN, ETH_ALEN);
-			mc_list[5] = priv->port;
-			mlx4_multicast_detach(mdev->dev, &priv->rss_map.indir_qp,
-					      mc_list, MLX4_PROT_ETH);
-		}
 		/* Flush mcast filter and init it with broadcast address */
 		mlx4_SET_MCAST_FLTR(mdev->dev, priv->port, ETH_BCAST,
 				    1, MLX4_MCAST_CONFIG);
@@ -367,13 +406,8 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 		netif_tx_lock_bh(dev);
 		mlx4_en_cache_mclist(dev);
 		netif_tx_unlock_bh(dev);
-		for (i = 0; i < priv->mc_addrs_cnt; i++) {
-			mcast_addr =
-			      mlx4_en_mac_to_u64(priv->mc_addrs + i * ETH_ALEN);
-			memcpy(&mc_list[10], priv->mc_addrs + i * ETH_ALEN, ETH_ALEN);
-			mc_list[5] = priv->port;
-			mlx4_multicast_attach(mdev->dev, &priv->rss_map.indir_qp,
-					      mc_list, 0, MLX4_PROT_ETH);
+		list_for_each_entry(mclist, &priv->mc_list, list) {
+			mcast_addr = mlx4_en_mac_to_u64(mclist->addr);
 			mlx4_SET_MCAST_FLTR(mdev->dev, priv->port,
 					    mcast_addr, 0, MLX4_MCAST_CONFIG);
 		}
@@ -381,6 +415,38 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 					  0, MLX4_MCAST_ENABLE);
 		if (err)
 			en_err(priv, "Failed enabling multicast filter\n");
+
+		update_mclist_flags(priv, &priv->curr_list, &priv->mc_list);
+		list_for_each_entry_safe(mclist, tmp, &priv->curr_list, list) {
+			if (mclist->action == MCLIST_REM) {
+				/* detach this address and delete from list */
+				memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
+				mc_list[5] = priv->port;
+				err = mlx4_multicast_detach(mdev->dev,
+							    &priv->rss_map.indir_qp,
+							    mc_list,
+							    MLX4_PROT_ETH);
+				if (err)
+					en_err(priv, "Fail to detach multicast address\n");
+
+				/* remove from list */
+				list_del(&mclist->list);
+				kfree(mclist);
+			}
+
+			if (mclist->action == MCLIST_ADD) {
+				/* attach the address */
+				memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
+				mc_list[5] = priv->port;
+				err = mlx4_multicast_attach(mdev->dev,
+							    &priv->rss_map.indir_qp,
+							    mc_list, 0,
+							    MLX4_PROT_ETH);
+				if (err)
+					en_err(priv, "Fail to attach multicast address\n");
+
+			}
+		}
 	}
 out:
 	mutex_unlock(&mdev->state_lock);
@@ -605,6 +671,9 @@ int mlx4_en_start_port(struct net_device *dev)
 		return 0;
 	}
 
+	INIT_LIST_HEAD(&priv->mc_list);
+	INIT_LIST_HEAD(&priv->curr_list);
+
 	/* Calculate Rx buf size */
 	dev->mtu = min(dev->mtu, priv->max_mtu);
 	mlx4_en_calc_rx_buf(dev);
@@ -760,6 +829,7 @@ void mlx4_en_stop_port(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_dev *mdev = priv->mdev;
+	struct mlx4_en_mc_list *mclist, *tmp;
 	int i;
 	u8 mc_list[16] = {0};
 
@@ -781,13 +851,18 @@ void mlx4_en_stop_port(struct net_device *dev)
 	mc_list[5] = priv->port;
 	mlx4_multicast_detach(mdev->dev, &priv->rss_map.indir_qp, mc_list,
 			      MLX4_PROT_ETH);
-	for (i = 0; i < priv->mc_addrs_cnt; i++) {
-		memcpy(&mc_list[10], priv->mc_addrs + i * ETH_ALEN, ETH_ALEN);
+	list_for_each_entry(mclist, &priv->curr_list, list) {
+		memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
 		mc_list[5] = priv->port;
 		mlx4_multicast_detach(mdev->dev, &priv->rss_map.indir_qp,
 				      mc_list, MLX4_PROT_ETH);
 	}
 	mlx4_en_clear_list(dev);
+	list_for_each_entry_safe(mclist, tmp, &priv->curr_list, list) {
+		list_del(&mclist->list);
+		kfree(mclist);
+	}
+
 	/* Flush multicast filter */
 	mlx4_SET_MCAST_FLTR(mdev->dev, priv->port, 0, 1, MLX4_MCAST_CONFIG);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 225c20d..1bb00cd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -404,6 +404,18 @@ struct mlx4_en_perf_stats {
 #define NUM_PERF_COUNTERS		6
 };
 
+enum mlx4_en_mclist_act {
+	MCLIST_NONE,
+	MCLIST_REM,
+	MCLIST_ADD,
+};
+
+struct mlx4_en_mc_list {
+	struct list_head	list;
+	enum mlx4_en_mclist_act	action;
+	u8			addr[ETH_ALEN];
+};
+
 struct mlx4_en_frag_info {
 	u16 frag_size;
 	u16 frag_prefix_size;
@@ -489,8 +501,8 @@ struct mlx4_en_priv {
 	struct mlx4_en_pkt_stats pkstats;
 	struct mlx4_en_port_stats port_stats;
 	u64 stats_bitmap;
-	char *mc_addrs;
-	int mc_addrs_cnt;
+	struct list_head mc_list;
+	struct list_head curr_list;
 	struct mlx4_en_stat_out_mbox hw_stats;
 	int vids[128];
 	bool wol;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 04/10] net/mlx4: Set steering mode according to device capabilities
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
                   ` (2 preceding siblings ...)
  2012-07-01  9:43 ` [PATCH net-next 03/10] net/mlx4_en: Re-design multicast attachments flow Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01 10:20   ` David Miller
  2012-07-01  9:43 ` [PATCH net-next 05/10] net/mlx4_core: Add firmware commands to support device managed flow steering Or Gerlitz
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.

This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.

A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.

B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,

The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  107 ++++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx4/fw.c        |    2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c      |   17 ++++-
 drivers/net/ethernet/mellanox/mlx4/mcg.c       |   70 +++++++---------
 drivers/net/ethernet/mellanox/mlx4/port.c      |    9 +-
 include/linux/mlx4/device.h                    |   24 +++++
 6 files changed, 148 insertions(+), 81 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 35c64f2..f9a4a00 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -263,7 +263,7 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 	struct mlx4_en_mc_list *mclist, *tmp;
 	u64 mcast_addr = 0;
 	u8 mc_list[16] = {0};
-	int err;
+	int err = 0;
 
 	mutex_lock(&mdev->state_lock);
 	if (!mdev->device_up) {
@@ -298,16 +298,35 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 			priv->flags |= MLX4_EN_FLAG_PROMISC;
 
 			/* Enable promiscouos mode */
-			if (!(mdev->dev->caps.flags &
-						MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-				err = mlx4_SET_PORT_qpn_calc(mdev->dev, priv->port,
-							     priv->base_qpn, 1);
-			else
-				err = mlx4_unicast_promisc_add(mdev->dev, priv->base_qpn,
+			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_B0:
+				err = mlx4_unicast_promisc_add(mdev->dev,
+							       priv->base_qpn,
 							       priv->port);
-			if (err)
-				en_err(priv, "Failed enabling "
-					     "promiscuous mode\n");
+				if (err)
+					en_err(priv, "Failed enabling unicast promiscuous mode\n");
+
+				/* Add the default qp number as multicast
+				* promisc */
+				if (!(priv->flags & MLX4_EN_FLAG_MC_PROMISC)) {
+					err = mlx4_multicast_promisc_add(mdev->dev,
+									 priv->base_qpn,
+									 priv->port);
+					if (err)
+						en_err(priv, "Failed enabling multicast promiscuous mode\n");
+					priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
+				}
+				break;
+
+			case MLX4_STEERING_MODE_A0:
+				err = mlx4_SET_PORT_qpn_calc(mdev->dev,
+							     priv->port,
+							     priv->base_qpn,
+							     1);
+				if (err)
+					en_err(priv, "Failed enabling promiscuous mode\n");
+				break;
+			}
 
 			/* Disable port multicast filter (unconditionally) */
 			err = mlx4_SET_MCAST_FLTR(mdev->dev, priv->port, 0,
@@ -316,15 +335,6 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 				en_err(priv, "Failed disabling "
 					     "multicast filter\n");
 
-			/* Add the default qp number as multicast promisc */
-			if (!(priv->flags & MLX4_EN_FLAG_MC_PROMISC)) {
-				err = mlx4_multicast_promisc_add(mdev->dev, priv->base_qpn,
-								 priv->port);
-				if (err)
-					en_err(priv, "Failed entering multicast promisc mode\n");
-				priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
-			}
-
 			/* Disable port VLAN filter */
 			err = mlx4_SET_VLAN_FLTR(mdev->dev, priv);
 			if (err)
@@ -343,22 +353,31 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 		priv->flags &= ~MLX4_EN_FLAG_PROMISC;
 
 		/* Disable promiscouos mode */
-		if (!(mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-			err = mlx4_SET_PORT_qpn_calc(mdev->dev, priv->port,
-						     priv->base_qpn, 0);
-		else
-			err = mlx4_unicast_promisc_remove(mdev->dev, priv->base_qpn,
+		switch (mdev->dev->caps.steering_mode) {
+		case MLX4_STEERING_MODE_B0:
+			err = mlx4_unicast_promisc_remove(mdev->dev,
+							  priv->base_qpn,
 							  priv->port);
-		if (err)
-			en_err(priv, "Failed disabling promiscuous mode\n");
+			if (err)
+				en_err(priv, "Failed disabling unicast promiscuous mode\n");
+			/* Disable Multicast promisc */
+			if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
+				err = mlx4_multicast_promisc_remove(mdev->dev,
+								    priv->base_qpn,
+								    priv->port);
+				if (err)
+					en_err(priv, "Failed disabling multicast promiscuous mode\n");
+				priv->flags &= ~MLX4_EN_FLAG_MC_PROMISC;
+			}
+			break;
 
-		/* Disable Multicast promisc */
-		if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
-			err = mlx4_multicast_promisc_remove(mdev->dev, priv->base_qpn,
-							    priv->port);
+		case MLX4_STEERING_MODE_A0:
+			err = mlx4_SET_PORT_qpn_calc(mdev->dev,
+						     priv->port,
+						     priv->base_qpn, 0);
 			if (err)
-				en_err(priv, "Failed disabling multicast promiscuous mode\n");
-			priv->flags &= ~MLX4_EN_FLAG_MC_PROMISC;
+				en_err(priv, "Failed disabling promiscuous mode\n");
+			break;
 		}
 
 		/* Enable port VLAN filter */
@@ -376,8 +395,16 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 
 		/* Add the default qp number as multicast promisc */
 		if (!(priv->flags & MLX4_EN_FLAG_MC_PROMISC)) {
-			err = mlx4_multicast_promisc_add(mdev->dev, priv->base_qpn,
-							 priv->port);
+			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_B0:
+				err = mlx4_multicast_promisc_add(mdev->dev,
+								 priv->base_qpn,
+								 priv->port);
+				break;
+
+			case MLX4_STEERING_MODE_A0:
+				break;
+			}
 			if (err)
 				en_err(priv, "Failed entering multicast promisc mode\n");
 			priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
@@ -385,8 +412,16 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 	} else {
 		/* Disable Multicast promisc */
 		if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
-			err = mlx4_multicast_promisc_remove(mdev->dev, priv->base_qpn,
-							    priv->port);
+			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_B0:
+				err = mlx4_multicast_promisc_remove(mdev->dev,
+								    priv->base_qpn,
+								    priv->port);
+				break;
+
+			case MLX4_STEERING_MODE_A0:
+				break;
+			}
 			if (err)
 				en_err(priv, "Failed disabling multicast promiscuous mode\n");
 			priv->flags &= ~MLX4_EN_FLAG_MC_PROMISC;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 9c83bb8..40e048b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1124,7 +1124,7 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param)
 	MLX4_PUT(inbox, param->mc_base,		INIT_HCA_MC_BASE_OFFSET);
 	MLX4_PUT(inbox, param->log_mc_entry_sz, INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET);
 	MLX4_PUT(inbox, param->log_mc_hash_sz,  INIT_HCA_LOG_MC_HASH_SZ_OFFSET);
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER)
+	if (dev->caps.steering_mode == MLX4_STEERING_MODE_B0)
 		MLX4_PUT(inbox, (u8) (1 << 3),	INIT_HCA_UC_STEERING_OFFSET);
 	MLX4_PUT(inbox, param->log_mc_table_sz, INIT_HCA_LOG_MC_TABLE_SZ_OFFSET);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index a0313de..7df0517 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -243,7 +243,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.reserved_srqs	     = dev_cap->reserved_srqs;
 	dev->caps.max_sq_desc_sz     = dev_cap->max_sq_desc_sz;
 	dev->caps.max_rq_desc_sz     = dev_cap->max_rq_desc_sz;
-	dev->caps.num_qp_per_mgm     = mlx4_get_qp_per_mgm(dev);
 	/*
 	 * Subtract 1 from the limit because we need to allocate a
 	 * spare CQE so the HCA HW can tell the difference between an
@@ -274,6 +273,22 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;
 	dev->caps.max_rss_tbl_sz     = dev_cap->max_rss_tbl_sz;
 
+	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER &&
+	    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER) {
+		dev->caps.steering_mode = MLX4_STEERING_MODE_B0;
+
+	} else {
+		dev->caps.steering_mode = MLX4_STEERING_MODE_A0;
+
+		if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER ||
+		    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER)
+			mlx4_warn(dev, "Must have UC_STEER and MC_STEER flags "
+				       "set to use B0 steering. Falling back to A0 steering mode.\n");
+	}
+	mlx4_dbg(dev, "Steering mode is: %s\n",
+		 mlx4_steering_mode_str(dev->caps.steering_mode));
+	dev->caps.num_qp_per_mgm = mlx4_get_qp_per_mgm(dev);
+
 	/* Sense port always allowed on supported devices for ConnectX1 and 2 */
 	if (dev->pdev->device != 0x1003)
 		dev->caps.flags |= MLX4_DEV_CAP_FLAG_SENSE_SUPPORT;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index f4a8f98..319c9d4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -868,36 +868,50 @@ static int mlx4_QP_ATTACH(struct mlx4_dev *dev, struct mlx4_qp *qp,
 int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 			  int block_mcast_loopback, enum mlx4_protocol prot)
 {
-	if (prot == MLX4_PROT_ETH &&
-			!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER))
-		return 0;
 
-	if (prot == MLX4_PROT_ETH)
-		gid[7] |= (MLX4_MC_STEER << 1);
+	switch (dev->caps.steering_mode) {
+	case MLX4_STEERING_MODE_A0:
+		if (prot == MLX4_PROT_ETH)
+			return 0;
 
-	if (mlx4_is_mfunc(dev))
-		return mlx4_QP_ATTACH(dev, qp, gid, 1,
-					block_mcast_loopback, prot);
+	case MLX4_STEERING_MODE_B0:
+		if (prot == MLX4_PROT_ETH)
+			gid[7] |= (MLX4_MC_STEER << 1);
 
-	return mlx4_qp_attach_common(dev, qp, gid, block_mcast_loopback,
-					prot, MLX4_MC_STEER);
+		if (mlx4_is_mfunc(dev))
+			return mlx4_QP_ATTACH(dev, qp, gid, 1,
+					      block_mcast_loopback, prot);
+		return mlx4_qp_attach_common(dev, qp, gid,
+					     block_mcast_loopback, prot,
+					     MLX4_MC_STEER);
+
+	default:
+		return -EINVAL;
+	}
 }
 EXPORT_SYMBOL_GPL(mlx4_multicast_attach);
 
 int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 			  enum mlx4_protocol prot)
 {
-	if (prot == MLX4_PROT_ETH &&
-			!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER))
-		return 0;
+	switch (dev->caps.steering_mode) {
+	case MLX4_STEERING_MODE_A0:
+		if (prot == MLX4_PROT_ETH)
+			return 0;
 
-	if (prot == MLX4_PROT_ETH)
-		gid[7] |= (MLX4_MC_STEER << 1);
+	case MLX4_STEERING_MODE_B0:
+		if (prot == MLX4_PROT_ETH)
+			gid[7] |= (MLX4_MC_STEER << 1);
 
-	if (mlx4_is_mfunc(dev))
-		return mlx4_QP_ATTACH(dev, qp, gid, 0, 0, prot);
+		if (mlx4_is_mfunc(dev))
+			return mlx4_QP_ATTACH(dev, qp, gid, 0, 0, prot);
+
+		return mlx4_qp_detach_common(dev, qp, gid, prot,
+					     MLX4_MC_STEER);
 
-	return mlx4_qp_detach_common(dev, qp, gid, prot, MLX4_MC_STEER);
+	default:
+		return -EINVAL;
+	}
 }
 EXPORT_SYMBOL_GPL(mlx4_multicast_detach);
 
@@ -905,10 +919,6 @@ int mlx4_unicast_attach(struct mlx4_dev *dev,
 			struct mlx4_qp *qp, u8 gid[16],
 			int block_mcast_loopback, enum mlx4_protocol prot)
 {
-	if (prot == MLX4_PROT_ETH &&
-			!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-		return 0;
-
 	if (prot == MLX4_PROT_ETH)
 		gid[7] |= (MLX4_UC_STEER << 1);
 
@@ -924,10 +934,6 @@ EXPORT_SYMBOL_GPL(mlx4_unicast_attach);
 int mlx4_unicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp,
 			       u8 gid[16], enum mlx4_protocol prot)
 {
-	if (prot == MLX4_PROT_ETH &&
-			!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-		return 0;
-
 	if (prot == MLX4_PROT_ETH)
 		gid[7] |= (MLX4_UC_STEER << 1);
 
@@ -968,9 +974,6 @@ static int mlx4_PROMISC(struct mlx4_dev *dev, u32 qpn,
 
 int mlx4_multicast_promisc_add(struct mlx4_dev *dev, u32 qpn, u8 port)
 {
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER))
-		return 0;
-
 	if (mlx4_is_mfunc(dev))
 		return mlx4_PROMISC(dev, qpn, MLX4_MC_STEER, 1, port);
 
@@ -980,9 +983,6 @@ EXPORT_SYMBOL_GPL(mlx4_multicast_promisc_add);
 
 int mlx4_multicast_promisc_remove(struct mlx4_dev *dev, u32 qpn, u8 port)
 {
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER))
-		return 0;
-
 	if (mlx4_is_mfunc(dev))
 		return mlx4_PROMISC(dev, qpn, MLX4_MC_STEER, 0, port);
 
@@ -992,9 +992,6 @@ EXPORT_SYMBOL_GPL(mlx4_multicast_promisc_remove);
 
 int mlx4_unicast_promisc_add(struct mlx4_dev *dev, u32 qpn, u8 port)
 {
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-		return 0;
-
 	if (mlx4_is_mfunc(dev))
 		return mlx4_PROMISC(dev, qpn, MLX4_UC_STEER, 1, port);
 
@@ -1004,9 +1001,6 @@ EXPORT_SYMBOL_GPL(mlx4_unicast_promisc_add);
 
 int mlx4_unicast_promisc_remove(struct mlx4_dev *dev, u32 qpn, u8 port)
 {
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER))
-		return 0;
-
 	if (mlx4_is_mfunc(dev))
 		return mlx4_PROMISC(dev, qpn, MLX4_UC_STEER, 0, port);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index a8fb529..58de723 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -155,7 +155,7 @@ int mlx4_get_eth_qp(struct mlx4_dev *dev, u8 port, u64 mac, int *qpn)
 		return err;
 	}
 
-	if (!(dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER)) {
+	if (dev->caps.steering_mode == MLX4_STEERING_MODE_A0) {
 		*qpn = info->base_qpn + index;
 		return 0;
 	}
@@ -206,7 +206,7 @@ void mlx4_put_eth_qp(struct mlx4_dev *dev, u8 port, u64 mac, int qpn)
 		 (unsigned long long) mac);
 	mlx4_unregister_mac(dev, port, mac);
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER) {
+	if (dev->caps.steering_mode != MLX4_STEERING_MODE_A0) {
 		entry = radix_tree_lookup(&info->mac_tree, qpn);
 		if (entry) {
 			mlx4_dbg(dev, "Releasing qp: port %d, mac 0x%llx,"
@@ -359,7 +359,7 @@ int mlx4_replace_mac(struct mlx4_dev *dev, u8 port, int qpn, u64 new_mac)
 	int index = qpn - info->base_qpn;
 	int err = 0;
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER) {
+	if (dev->caps.steering_mode != MLX4_STEERING_MODE_A0) {
 		entry = radix_tree_lookup(&info->mac_tree, qpn);
 		if (!entry)
 			return -EINVAL;
@@ -803,8 +803,7 @@ int mlx4_SET_PORT_qpn_calc(struct mlx4_dev *dev, u8 port, u32 base_qpn,
 	u32 m_promisc = (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER) ?
 		MCAST_DIRECT : MCAST_DEFAULT;
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER  &&
-	    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER)
+	if (dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
 		return 0;
 
 	mailbox = mlx4_alloc_cmd_mailbox(dev);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 6a8f002..80891a4 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -70,6 +70,29 @@ enum {
 	MLX4_MFUNC_EQE_MASK     = (MLX4_MFUNC_MAX_EQES - 1)
 };
 
+/* Driver supports 2 diffrent device methods to manage traffic steering:
+	- B0 steering mode - Common low level API for ib and (if supported) eth.
+	- A0 steering mode - Limited low level API for eth. In case of IB,
+			     B0 mode is in use.
+ */
+enum {
+	MLX4_STEERING_MODE_A0,
+	MLX4_STEERING_MODE_B0
+};
+
+static inline const char *mlx4_steering_mode_str(int steering_mode)
+{
+	switch (steering_mode) {
+	case MLX4_STEERING_MODE_A0:
+		return "A0 steering";
+
+	case MLX4_STEERING_MODE_B0:
+		return "B0 steering";
+	default:
+		return "Unrecognize steering mode";
+	}
+}
+
 enum {
 	MLX4_DEV_CAP_FLAG_RC		= 1LL <<  0,
 	MLX4_DEV_CAP_FLAG_UC		= 1LL <<  1,
@@ -295,6 +318,7 @@ struct mlx4_caps {
 	int			num_amgms;
 	int			reserved_mcgs;
 	int			num_qp_per_mgm;
+	int			steering_mode;
 	int			num_pds;
 	int			reserved_pds;
 	int			max_xrcds;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 05/10] net/mlx4_core: Add firmware commands to support device managed flow steering
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
                   ` (3 preceding siblings ...)
  2012-07-01  9:43 ` [PATCH net-next 04/10] net/mlx4: Set steering mode according to device capabilities Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01  9:43 ` [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API Or Gerlitz
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

Add support for firmware commands to attach/detach a new device managed
steering mode. Such network steering rules allow the user to provide an
L2/L3/L4 flow specification to the firmware and have the device to steer
traffic that matches that specification to the provided QP.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |   19 +++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mcg.c           |   29 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |   10 +++++++
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   24 ++++++++++++++++
 include/linux/mlx4/cmd.h                           |    4 +++
 5 files changed, 86 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 842c8ce..7e94987 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -1080,6 +1080,25 @@ static struct mlx4_cmd_info cmd_info[] = {
 		.verify = NULL,
 		.wrapper = NULL
 	},
+	/* flow steering commands */
+	{
+		.opcode = MLX4_QP_FLOW_STEERING_ATTACH,
+		.has_inbox = true,
+		.has_outbox = false,
+		.out_is_imm = true,
+		.encode_slave_id = false,
+		.verify = NULL,
+		.wrapper = mlx4_QP_FLOW_STEERING_ATTACH_wrapper
+	},
+	{
+		.opcode = MLX4_QP_FLOW_STEERING_DETACH,
+		.has_inbox = false,
+		.has_outbox = false,
+		.out_is_imm = false,
+		.encode_slave_id = false,
+		.verify = NULL,
+		.wrapper = mlx4_QP_FLOW_STEERING_DETACH_wrapper
+	},
 };
 
 static int mlx4_master_process_vhcr(struct mlx4_dev *dev, int slave,
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index 319c9d4..3c59a33 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -62,6 +62,35 @@ int mlx4_get_qp_per_mgm(struct mlx4_dev *dev)
 	return 4 * (mlx4_get_mgm_entry_size(dev) / 16 - 2);
 }
 
+static int mlx4_QP_FLOW_STEERING_ATTACH(struct mlx4_dev *dev,
+					struct mlx4_cmd_mailbox *mailbox,
+					u32 size,
+					u64 *reg_id)
+{
+	u64 imm;
+	int err = 0;
+
+	err = mlx4_cmd_imm(dev, mailbox->dma, &imm, size, 0,
+			   MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A,
+			   MLX4_CMD_NATIVE);
+	if (err)
+		return err;
+	*reg_id = imm;
+
+	return err;
+}
+
+static int mlx4_QP_FLOW_STEERING_DETACH(struct mlx4_dev *dev, u64 regid)
+{
+	int err = 0;
+
+	err = mlx4_cmd(dev, regid, 0, 0,
+		       MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A,
+		       MLX4_CMD_NATIVE);
+
+	return err;
+}
+
 static int mlx4_READ_ENTRY(struct mlx4_dev *dev, int index,
 			   struct mlx4_cmd_mailbox *mailbox)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index a425a98..c07e882 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -1118,6 +1118,16 @@ int mlx4_QUERY_IF_STAT_wrapper(struct mlx4_dev *dev, int slave,
 			       struct mlx4_cmd_mailbox *inbox,
 			       struct mlx4_cmd_mailbox *outbox,
 			       struct mlx4_cmd_info *cmd);
+int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev *dev, int slave,
+					 struct mlx4_vhcr *vhcr,
+					 struct mlx4_cmd_mailbox *inbox,
+					 struct mlx4_cmd_mailbox *outbox,
+					 struct mlx4_cmd_info *cmd);
+int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
+					 struct mlx4_vhcr *vhcr,
+					 struct mlx4_cmd_mailbox *inbox,
+					 struct mlx4_cmd_mailbox *outbox,
+					 struct mlx4_cmd_info *cmd);
 
 int mlx4_get_mgm_entry_size(struct mlx4_dev *dev);
 int mlx4_get_qp_per_mgm(struct mlx4_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index b8e8969..4301aa5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -2740,6 +2740,30 @@ ex_put:
 	return err;
 }
 
+int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev *dev, int slave,
+					 struct mlx4_vhcr *vhcr,
+					 struct mlx4_cmd_mailbox *inbox,
+					 struct mlx4_cmd_mailbox *outbox,
+					 struct mlx4_cmd_info *cmd)
+{
+	return mlx4_cmd_imm(dev, inbox->dma, &vhcr->out_param,
+			    vhcr->in_modifier, 0,
+			    MLX4_QP_FLOW_STEERING_ATTACH,
+			    MLX4_CMD_TIME_CLASS_A,
+			    MLX4_CMD_NATIVE);
+}
+
+int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
+					 struct mlx4_vhcr *vhcr,
+					 struct mlx4_cmd_mailbox *inbox,
+					 struct mlx4_cmd_mailbox *outbox,
+					 struct mlx4_cmd_info *cmd)
+{
+	return mlx4_cmd(dev, vhcr->in_param, 0, 0,
+			MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A,
+			MLX4_CMD_NATIVE);
+}
+
 enum {
 	BUSY_MAX_RETRIES = 10
 };
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 1f3860a..2606951 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -154,6 +154,10 @@ enum {
 	/* set port opcode modifiers */
 	MLX4_SET_PORT_PRIO2TC = 0x8,
 	MLX4_SET_PORT_SCHEDULER  = 0x9,
+
+	/* register/delete flow steering network rules */
+	MLX4_QP_FLOW_STEERING_ATTACH = 0x65,
+	MLX4_QP_FLOW_STEERING_DETACH = 0x66,
 };
 
 enum {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
                   ` (4 preceding siblings ...)
  2012-07-01  9:43 ` [PATCH net-next 05/10] net/mlx4_core: Add firmware commands to support device managed flow steering Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01 10:30   ` David Miller
  2012-07-01  9:43 ` [PATCH net-next 07/10] net/mlx4_core: Add resource tracking for device managed flow steering rules Or Gerlitz
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Set the firmware to use either standard L2 or L3/L4 optimized flow
steering configuration. Currently there's no dedicated firmware
command to set this, and it has to be provided to the device during
initialization, hence a module param was used here.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/infiniband/hw/mlx4/main.c                  |   62 +++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h               |    1 +
 drivers/infiniband/hw/mlx4/qp.c                    |    1 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |   21 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c            |   90 ++++-
 drivers/net/ethernet/mellanox/mlx4/fw.h            |    3 +
 drivers/net/ethernet/mellanox/mlx4/main.c          |   63 +++-
 drivers/net/ethernet/mellanox/mlx4/mcg.c           |  365 +++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |   13 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |    2 +
 drivers/net/ethernet/mellanox/mlx4/port.c          |  102 ++++--
 drivers/net/ethernet/mellanox/mlx4/profile.c       |   12 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |    6 +
 include/linux/mlx4/device.h                        |  108 ++++++-
 14 files changed, 766 insertions(+), 83 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 3530c41..8a3a203 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -718,26 +718,53 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
 	return ret;
 }
 
+struct mlx4_ib_steering {
+	struct list_head list;
+	u64 reg_id;
+	union ib_gid gid;
+};
+
 static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 {
 	int err;
 	struct mlx4_ib_dev *mdev = to_mdev(ibqp->device);
 	struct mlx4_ib_qp *mqp = to_mqp(ibqp);
+	u64 reg_id;
+	struct mlx4_ib_steering *ib_steering = NULL;
+
+	if (mdev->dev->caps.steering_mode ==
+	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
+		ib_steering = kmalloc(sizeof(*ib_steering), GFP_KERNEL);
+		if (!ib_steering)
+			return -ENOMEM;
+	}
 
-	err = mlx4_multicast_attach(mdev->dev, &mqp->mqp, gid->raw,
-				    !!(mqp->flags & MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK),
-				    MLX4_PROT_IB_IPV6);
+	err = mlx4_multicast_attach(mdev->dev, &mqp->mqp, gid->raw, mqp->port,
+				    !!(mqp->flags &
+				       MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK),
+				    MLX4_PROT_IB_IPV6, &reg_id);
 	if (err)
-		return err;
+		goto err_malloc;
 
 	err = add_gid_entry(ibqp, gid);
 	if (err)
 		goto err_add;
 
+	if (ib_steering) {
+		memcpy(ib_steering->gid.raw, gid->raw, 16);
+		ib_steering->reg_id = reg_id;
+		mutex_lock(&mqp->mutex);
+		list_add(&ib_steering->list, &mqp->steering_rules);
+		mutex_unlock(&mqp->mutex);
+	}
 	return 0;
 
 err_add:
-	mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw, MLX4_PROT_IB_IPV6);
+	mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw,
+			      MLX4_PROT_IB_IPV6, reg_id);
+err_malloc:
+	kfree(ib_steering);
+
 	return err;
 }
 
@@ -765,9 +792,30 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	u8 mac[6];
 	struct net_device *ndev;
 	struct mlx4_ib_gid_entry *ge;
+	u64 reg_id = 0;
+
+	if (mdev->dev->caps.steering_mode ==
+	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
+		struct mlx4_ib_steering *ib_steering;
+
+		mutex_lock(&mqp->mutex);
+		list_for_each_entry(ib_steering, &mqp->steering_rules, list) {
+			if (!memcmp(ib_steering->gid.raw, gid->raw, 16)) {
+				list_del(&ib_steering->list);
+				break;
+			}
+		}
+		mutex_unlock(&mqp->mutex);
+		if (&ib_steering->list == &mqp->steering_rules) {
+			pr_err("Couldn't find reg_id for mgid. Steering rule is left attached\n");
+			return -EINVAL;
+		}
+		reg_id = ib_steering->reg_id;
+		kfree(ib_steering);
+	}
 
-	err = mlx4_multicast_detach(mdev->dev,
-				    &mqp->mqp, gid->raw, MLX4_PROT_IB_IPV6);
+	err = mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw,
+				    MLX4_PROT_IB_IPV6, reg_id);
 	if (err)
 		return err;
 
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index ff36655..42df4f7 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -163,6 +163,7 @@ struct mlx4_ib_qp {
 	u8			state;
 	int			mlx_type;
 	struct list_head	gid_list;
+	struct list_head	steering_rules;
 };
 
 struct mlx4_ib_srq {
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 8d4ed24..6af19f6 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -495,6 +495,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 	spin_lock_init(&qp->sq.lock);
 	spin_lock_init(&qp->rq.lock);
 	INIT_LIST_HEAD(&qp->gid_list);
+	INIT_LIST_HEAD(&qp->steering_rules);
 
 	qp->state	 = IB_QPS_RESET;
 	if (init_attr->sq_sig_type == IB_SIGNAL_ALL_WR)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index f9a4a00..62f0601 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -460,7 +460,8 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 				err = mlx4_multicast_detach(mdev->dev,
 							    &priv->rss_map.indir_qp,
 							    mc_list,
-							    MLX4_PROT_ETH);
+							    MLX4_PROT_ETH,
+							    mclist->reg_id);
 				if (err)
 					en_err(priv, "Fail to detach multicast address\n");
 
@@ -472,11 +473,14 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 			if (mclist->action == MCLIST_ADD) {
 				/* attach the address */
 				memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
+				/* needed for B0 steering support */
 				mc_list[5] = priv->port;
 				err = mlx4_multicast_attach(mdev->dev,
 							    &priv->rss_map.indir_qp,
-							    mc_list, 0,
-							    MLX4_PROT_ETH);
+							    mc_list,
+							    priv->port, 0,
+							    MLX4_PROT_ETH,
+							    &mclist->reg_id);
 				if (err)
 					en_err(priv, "Fail to attach multicast address\n");
 
@@ -824,9 +828,10 @@ int mlx4_en_start_port(struct net_device *dev)
 
 	/* Attach rx QP to bradcast address */
 	memset(&mc_list[10], 0xff, ETH_ALEN);
-	mc_list[5] = priv->port;
+	mc_list[5] = priv->port; /* needed for B0 steering support */
 	if (mlx4_multicast_attach(mdev->dev, &priv->rss_map.indir_qp, mc_list,
-				  0, MLX4_PROT_ETH))
+				  priv->port, 0, MLX4_PROT_ETH,
+				  &priv->broadcast_id))
 		mlx4_warn(mdev, "Failed Attaching Broadcast\n");
 
 	/* Must redo promiscuous mode setup. */
@@ -883,14 +888,14 @@ void mlx4_en_stop_port(struct net_device *dev)
 
 	/* Detach All multicasts */
 	memset(&mc_list[10], 0xff, ETH_ALEN);
-	mc_list[5] = priv->port;
+	mc_list[5] = priv->port; /* needed for B0 steering support */
 	mlx4_multicast_detach(mdev->dev, &priv->rss_map.indir_qp, mc_list,
-			      MLX4_PROT_ETH);
+			      MLX4_PROT_ETH, priv->broadcast_id);
 	list_for_each_entry(mclist, &priv->curr_list, list) {
 		memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
 		mc_list[5] = priv->port;
 		mlx4_multicast_detach(mdev->dev, &priv->rss_map.indir_qp,
-				      mc_list, MLX4_PROT_ETH);
+				      mc_list, MLX4_PROT_ETH, mclist->reg_id);
 	}
 	mlx4_en_clear_list(dev);
 	list_for_each_entry_safe(mclist, tmp, &priv->curr_list, list) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 40e048b..c7fb921 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -123,7 +123,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 flags)
 	static const char * const fname[] = {
 		[0] = "RSS support",
 		[1] = "RSS Toeplitz Hash Function support",
-		[2] = "RSS XOR Hash Function support"
+		[2] = "RSS XOR Hash Function support",
+		[3] = "Device manage flow steering support"
 	};
 	int i;
 
@@ -391,6 +392,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_RSVD_XRC_OFFSET		0x66
 #define QUERY_DEV_CAP_MAX_XRC_OFFSET		0x67
 #define QUERY_DEV_CAP_MAX_COUNTERS_OFFSET	0x68
+#define QUERY_DEV_CAP_FLOW_STEERING_RANGE_EN_OFFSET	0x76
+#define QUERY_DEV_CAP_FLOW_STEERING_MAX_QP_OFFSET	0x77
 #define QUERY_DEV_CAP_RDMARC_ENTRY_SZ_OFFSET	0x80
 #define QUERY_DEV_CAP_QPC_ENTRY_SZ_OFFSET	0x82
 #define QUERY_DEV_CAP_AUX_ENTRY_SZ_OFFSET	0x84
@@ -474,6 +477,12 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev_cap->num_ports = field & 0xf;
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_MSG_SZ_OFFSET);
 	dev_cap->max_msg_sz = 1 << (field & 0x1f);
+	MLX4_GET(field, outbox, QUERY_DEV_CAP_FLOW_STEERING_RANGE_EN_OFFSET);
+	if (field & 0x80)
+		dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_FS_EN;
+	dev_cap->fs_log_max_ucast_qp_range_size = field & 0x1f;
+	MLX4_GET(field, outbox, QUERY_DEV_CAP_FLOW_STEERING_MAX_QP_OFFSET);
+	dev_cap->fs_max_num_qp_per_entry = field;
 	MLX4_GET(stat_rate, outbox, QUERY_DEV_CAP_RATE_SUPPORT_OFFSET);
 	dev_cap->stat_rate_support = stat_rate;
 	MLX4_GET(ext_flags, outbox, QUERY_DEV_CAP_EXT_FLAGS_OFFSET);
@@ -1061,6 +1070,15 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param)
 #define	 INIT_HCA_LOG_MC_HASH_SZ_OFFSET	 (INIT_HCA_MCAST_OFFSET + 0x16)
 #define  INIT_HCA_UC_STEERING_OFFSET	 (INIT_HCA_MCAST_OFFSET + 0x18)
 #define	 INIT_HCA_LOG_MC_TABLE_SZ_OFFSET (INIT_HCA_MCAST_OFFSET + 0x1b)
+#define  INIT_HCA_DEVICE_MANAGED_FLOW_STEERING_EN	0x6
+#define  INIT_HCA_FS_PARAM_OFFSET         0x1d0
+#define  INIT_HCA_FS_BASE_OFFSET          (INIT_HCA_FS_PARAM_OFFSET + 0x00)
+#define  INIT_HCA_FS_LOG_ENTRY_SZ_OFFSET  (INIT_HCA_FS_PARAM_OFFSET + 0x12)
+#define  INIT_HCA_FS_LOG_TABLE_SZ_OFFSET  (INIT_HCA_FS_PARAM_OFFSET + 0x1b)
+#define  INIT_HCA_FS_ETH_BITS_OFFSET      (INIT_HCA_FS_PARAM_OFFSET + 0x21)
+#define  INIT_HCA_FS_ETH_NUM_ADDRS_OFFSET (INIT_HCA_FS_PARAM_OFFSET + 0x22)
+#define  INIT_HCA_FS_IB_BITS_OFFSET       (INIT_HCA_FS_PARAM_OFFSET + 0x25)
+#define  INIT_HCA_FS_IB_NUM_ADDRS_OFFSET  (INIT_HCA_FS_PARAM_OFFSET + 0x26)
 #define INIT_HCA_TPT_OFFSET		 0x0f0
 #define	 INIT_HCA_DMPT_BASE_OFFSET	 (INIT_HCA_TPT_OFFSET + 0x00)
 #define	 INIT_HCA_LOG_MPT_SZ_OFFSET	 (INIT_HCA_TPT_OFFSET + 0x0b)
@@ -1119,14 +1137,44 @@ int mlx4_INIT_HCA(struct mlx4_dev *dev, struct mlx4_init_hca_param *param)
 	MLX4_PUT(inbox, param->rdmarc_base,   INIT_HCA_RDMARC_BASE_OFFSET);
 	MLX4_PUT(inbox, param->log_rd_per_qp, INIT_HCA_LOG_RD_OFFSET);
 
-	/* multicast attributes */
+	/* steering attributes */
+	if (dev->caps.steering_mode ==
+	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
+		*(inbox + INIT_HCA_FLAGS_OFFSET / 4) |=
+			cpu_to_be32(1 <<
+				    INIT_HCA_DEVICE_MANAGED_FLOW_STEERING_EN);
+
+		MLX4_PUT(inbox, param->mc_base, INIT_HCA_FS_BASE_OFFSET);
+		MLX4_PUT(inbox, param->log_mc_entry_sz,
+			 INIT_HCA_FS_LOG_ENTRY_SZ_OFFSET);
+		MLX4_PUT(inbox, param->log_mc_table_sz,
+			 INIT_HCA_FS_LOG_TABLE_SZ_OFFSET);
+		/* Enable Ethernet flow steering
+		 * with udp unicast and tcp unicast */
+		MLX4_PUT(inbox, param->fs_hash_enable_bits,
+			 INIT_HCA_FS_ETH_BITS_OFFSET);
+		MLX4_PUT(inbox, (u16) MLX4_FS_NUM_OF_L2_ADDR,
+			 INIT_HCA_FS_ETH_NUM_ADDRS_OFFSET);
+		/* Enable IPoIB flow steering
+		 * with udp unicast and tcp unicast */
+		MLX4_PUT(inbox, param->fs_hash_enable_bits,
+			 INIT_HCA_FS_IB_BITS_OFFSET);
+		MLX4_PUT(inbox, (u16) MLX4_FS_NUM_OF_L2_ADDR,
+			 INIT_HCA_FS_IB_NUM_ADDRS_OFFSET);
 
-	MLX4_PUT(inbox, param->mc_base,		INIT_HCA_MC_BASE_OFFSET);
-	MLX4_PUT(inbox, param->log_mc_entry_sz, INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET);
-	MLX4_PUT(inbox, param->log_mc_hash_sz,  INIT_HCA_LOG_MC_HASH_SZ_OFFSET);
-	if (dev->caps.steering_mode == MLX4_STEERING_MODE_B0)
-		MLX4_PUT(inbox, (u8) (1 << 3),	INIT_HCA_UC_STEERING_OFFSET);
-	MLX4_PUT(inbox, param->log_mc_table_sz, INIT_HCA_LOG_MC_TABLE_SZ_OFFSET);
+	} else {
+
+		MLX4_PUT(inbox, param->mc_base,	INIT_HCA_MC_BASE_OFFSET);
+		MLX4_PUT(inbox, param->log_mc_entry_sz,
+			 INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET);
+		MLX4_PUT(inbox, param->log_mc_hash_sz,
+			 INIT_HCA_LOG_MC_HASH_SZ_OFFSET);
+		MLX4_PUT(inbox, param->log_mc_table_sz,
+			 INIT_HCA_LOG_MC_TABLE_SZ_OFFSET);
+		if (dev->caps.steering_mode == MLX4_STEERING_MODE_B0)
+			MLX4_PUT(inbox, (u8) (1 << 3),
+				 INIT_HCA_UC_STEERING_OFFSET);
+	}
 
 	/* TPT attributes */
 
@@ -1188,15 +1236,25 @@ int mlx4_QUERY_HCA(struct mlx4_dev *dev,
 	MLX4_GET(param->rdmarc_base,   outbox, INIT_HCA_RDMARC_BASE_OFFSET);
 	MLX4_GET(param->log_rd_per_qp, outbox, INIT_HCA_LOG_RD_OFFSET);
 
-	/* multicast attributes */
+	/* steering attributes */
+	if (dev->caps.steering_mode ==
+	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
 
-	MLX4_GET(param->mc_base,         outbox, INIT_HCA_MC_BASE_OFFSET);
-	MLX4_GET(param->log_mc_entry_sz, outbox,
-		 INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET);
-	MLX4_GET(param->log_mc_hash_sz,  outbox,
-		 INIT_HCA_LOG_MC_HASH_SZ_OFFSET);
-	MLX4_GET(param->log_mc_table_sz, outbox,
-		 INIT_HCA_LOG_MC_TABLE_SZ_OFFSET);
+		MLX4_GET(param->mc_base, outbox, INIT_HCA_FS_BASE_OFFSET);
+		MLX4_GET(param->log_mc_entry_sz, outbox,
+			 INIT_HCA_FS_LOG_ENTRY_SZ_OFFSET);
+		MLX4_GET(param->log_mc_table_sz, outbox,
+			 INIT_HCA_FS_LOG_TABLE_SZ_OFFSET);
+	} else {
+
+		MLX4_GET(param->mc_base, outbox, INIT_HCA_MC_BASE_OFFSET);
+		MLX4_GET(param->log_mc_entry_sz, outbox,
+			 INIT_HCA_LOG_MC_ENTRY_SZ_OFFSET);
+		MLX4_GET(param->log_mc_hash_sz,  outbox,
+			 INIT_HCA_LOG_MC_HASH_SZ_OFFSET);
+		MLX4_GET(param->log_mc_table_sz, outbox,
+			 INIT_HCA_LOG_MC_TABLE_SZ_OFFSET);
+	}
 
 	/* TPT attributes */
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h b/drivers/net/ethernet/mellanox/mlx4/fw.h
index 64c0399..83fcbbf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -78,6 +78,8 @@ struct mlx4_dev_cap {
 	u16 wavelength[MLX4_MAX_PORTS + 1];
 	u64 trans_code[MLX4_MAX_PORTS + 1];
 	u16 stat_rate_support;
+	int fs_log_max_ucast_qp_range_size;
+	int fs_max_num_qp_per_entry;
 	u64 flags;
 	u64 flags2;
 	int reserved_uars;
@@ -165,6 +167,7 @@ struct mlx4_init_hca_param {
 	u8  log_mpt_sz;
 	u8  log_uar_sz;
 	u8  uar_page_sz; /* log pg sz in 4k chunks */
+	u8  fs_hash_enable_bits;
 };
 
 struct mlx4_init_ib_param {
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 7df0517..cf5eb2c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -90,7 +90,9 @@ module_param_named(log_num_mgm_entry_size,
 MODULE_PARM_DESC(log_num_mgm_entry_size, "log mgm size, that defines the num"
 					 " of qp per mcg, for example:"
 					 " 10 gives 248.range: 9<="
-					 " log_num_mgm_entry_size <= 12");
+					 " log_num_mgm_entry_size <= 12."
+					 " Not in use with device managed"
+					 " flow steering");
 
 #define MLX4_VF                                        (1 << 0)
 
@@ -136,6 +138,11 @@ module_param_array(port_type_array, int, &arr_argc, 0444);
 MODULE_PARM_DESC(port_type_array, "Array of port types: HW_DEFAULT (0) is default "
 				"1 for IB, 2 for Ethernet");
 
+static int mlx4_flow_steering_hash;
+module_param_named(flow_steering_hash, mlx4_flow_steering_hash, int, 0444);
+MODULE_PARM_DESC(flow_steering_hash,
+		 "Flow steering hash function configuration. Config options: L2 = 0, L2_L3_L4 = 1 (default: L2).");
+
 struct mlx4_port_config {
 	struct list_head list;
 	enum mlx4_port_type port_type[MLX4_MAX_PORTS + 1];
@@ -273,21 +280,29 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;
 	dev->caps.max_rss_tbl_sz     = dev_cap->max_rss_tbl_sz;
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER &&
-	    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER) {
-		dev->caps.steering_mode = MLX4_STEERING_MODE_B0;
-
+	if (dev_cap->flags2 & MLX4_DEV_CAP_FLAG2_FS_EN) {
+		dev->caps.steering_mode = MLX4_STEERING_MODE_DEVICE_MANAGED;
+		dev->caps.num_qp_per_mgm = dev_cap->fs_max_num_qp_per_entry;
+		dev->caps.fs_log_max_ucast_qp_range_size =
+			dev_cap->fs_log_max_ucast_qp_range_size;
 	} else {
-		dev->caps.steering_mode = MLX4_STEERING_MODE_A0;
+		if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER &&
+		    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER) {
+			dev->caps.steering_mode = MLX4_STEERING_MODE_B0;
+
+		} else {
+
+			dev->caps.steering_mode = MLX4_STEERING_MODE_A0;
 
-		if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER ||
-		    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER)
-			mlx4_warn(dev, "Must have UC_STEER and MC_STEER flags "
-				       "set to use B0 steering. Falling back to A0 steering mode.\n");
+			if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER ||
+			    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER)
+				mlx4_warn(dev, "Must have UC_STEER and MC_STEER flags "
+						"set to use B0 steering. Falling back to A0 steering mode.\n");
+		}
+		dev->caps.num_qp_per_mgm = mlx4_get_qp_per_mgm(dev);
 	}
 	mlx4_dbg(dev, "Steering mode is: %s\n",
 		 mlx4_steering_mode_str(dev->caps.steering_mode));
-	dev->caps.num_qp_per_mgm = mlx4_get_qp_per_mgm(dev);
 
 	/* Sense port always allowed on supported devices for ConnectX1 and 2 */
 	if (dev->pdev->device != 0x1003)
@@ -982,9 +997,11 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 	}
 
 	/*
-	 * It's not strictly required, but for simplicity just map the
-	 * whole multicast group table now.  The table isn't very big
-	 * and it's a lot easier than trying to track ref counts.
+	 * For flow steering device managed mode it is required to use
+	 * mlx4_init_icm_table. For B0 steering mode it's not strictly
+	 * required, but for simplicity just map the whole multicast
+	 * group table now.  The table isn't very big and it's a lot
+	 * easier than trying to track ref counts.
 	 */
 	err = mlx4_init_icm_table(dev, &priv->mcg_table.table,
 				  init_hca->mc_base,
@@ -1220,7 +1237,25 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 			goto err_stop_fw;
 		}
 
+		priv->fs_hash_mode = mlx4_flow_steering_hash;
+
+		switch (priv->fs_hash_mode) {
+		case MLX4_FS_L2_HASH:
+			init_hca.fs_hash_enable_bits = 0;
+			break;
+
+		case MLX4_FS_L2_L3_L4_HASH:
+			/* Enable flow steering with
+			   udp unicast and tcp unicast*/
+			init_hca.fs_hash_enable_bits =
+				MLX4_FS_UDP_UC_EN | MLX4_FS_TCP_UC_EN;
+			break;
+		}
+
 		profile = default_profile;
+		if (dev->caps.steering_mode ==
+		    MLX4_STEERING_MODE_DEVICE_MANAGED)
+			profile.num_mcg = MLX4_FS_NUM_MCG;
 
 		icm_size = mlx4_make_profile(dev, &profile, &dev_cap,
 					     &init_hca);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index 3c59a33..c567655 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -41,6 +41,7 @@
 
 #define MGM_QPN_MASK       0x00FFFFFF
 #define MGM_BLCK_LB_BIT    30
+#define MLX4_MAC_MASK	   0xffffffffffffULL
 
 static const u8 zero_gid[16];	/* automatically initialized to 0 */
 
@@ -54,7 +55,12 @@ struct mlx4_mgm {
 
 int mlx4_get_mgm_entry_size(struct mlx4_dev *dev)
 {
-	return min((1 << mlx4_log_num_mgm_entry_size), MLX4_MAX_MGM_ENTRY_SIZE);
+	if (dev->caps.steering_mode ==
+	    MLX4_STEERING_MODE_DEVICE_MANAGED)
+		return 1 << MLX4_FS_MGM_LOG_ENTRY_SIZE;
+	else
+		return min((1 << mlx4_log_num_mgm_entry_size),
+			   MLX4_MAX_MGM_ENTRY_SIZE);
 }
 
 int mlx4_get_qp_per_mgm(struct mlx4_dev *dev)
@@ -643,6 +649,311 @@ static int find_entry(struct mlx4_dev *dev, u8 port,
 	return err;
 }
 
+struct mlx4_net_trans_rule_hw_ctrl {
+	__be32 ctrl;
+	__be32 vf_vep_port;
+	__be32 qpn;
+	__be32 reserved;
+};
+
+static void trans_rule_ctrl_to_hw(struct mlx4_net_trans_rule *ctrl,
+				struct mlx4_net_trans_rule_hw_ctrl *hw)
+{
+	static const u8 __promisc_mode[] = {
+		[MLX4_FS_PROMISC_NONE]   = 0x0,
+		[MLX4_FS_PROMISC_UPLINK] = 0x1,
+		[MLX4_FS_PROMISC_FUNCTION_PORT] = 0x2,
+		[MLX4_FS_PROMISC_ALL_MULTI] = 0x3,
+	};
+
+	u32 dw = 0;
+
+	dw = ctrl->queue_mode == MLX4_NET_TRANS_Q_LIFO ? 1 : 0;
+	dw |= ctrl->exclusive ? (1 << 2) : 0;
+	dw |= ctrl->allow_loopback ? (1 << 3) : 0;
+	dw |= __promisc_mode[ctrl->promisc_mode] << 8;
+	dw |= ctrl->priority << 16;
+
+	hw->ctrl = cpu_to_be32(dw);
+	hw->vf_vep_port = cpu_to_be32(ctrl->port);
+	hw->qpn = cpu_to_be32(ctrl->qpn);
+}
+
+struct mlx4_net_trans_rule_hw_ib {
+	u8	size;
+	u8	rsvd1;
+	__be16	id;
+	u32	rsvd2;
+	__be32	qpn;
+	__be32	qpn_mask;
+	u8	dst_gid[16];
+	u8	dst_gid_msk[16];
+} __packed;
+
+struct mlx4_net_trans_rule_hw_eth {
+	u8	size;
+	u8	rsvd;
+	__be16	id;
+	u8	rsvd1[6];
+	u8	dst_mac[6];
+	u16	rsvd2;
+	u8	dst_mac_msk[6];
+	u16	rsvd3;
+	u8	src_mac[6];
+	u16	rsvd4;
+	u8	src_mac_msk[6];
+	u8      rsvd5;
+	u8      ether_type_enable;
+	__be16  ether_type;
+	__be16  vlan_id_msk;
+	__be16  vlan_id;
+} __packed;
+
+struct mlx4_net_trans_rule_hw_tcp_udp {
+	u8	size;
+	u8	rsvd;
+	__be16	id;
+	__be16	rsvd1[3];
+	__be16	dst_port;
+	__be16	rsvd2;
+	__be16	dst_port_msk;
+	__be16	rsvd3;
+	__be16	src_port;
+	__be16	rsvd4;
+	__be16	src_port_msk;
+} __packed;
+
+struct mlx4_net_trans_rule_hw_ipv4 {
+	u8	size;
+	u8	rsvd;
+	__be16	id;
+	__be32	rsvd1;
+	__be32	dst_ip;
+	__be32	dst_ip_msk;
+	__be32	src_ip;
+	__be32	src_ip_msk;
+} __packed;
+
+struct _rule_hw {
+	union {
+		struct {
+			u8 size;
+			u8 rsvd;
+			__be16 id;
+		};
+		struct mlx4_net_trans_rule_hw_eth eth;
+		struct mlx4_net_trans_rule_hw_ib ib;
+		struct mlx4_net_trans_rule_hw_ipv4 ipv4;
+		struct mlx4_net_trans_rule_hw_tcp_udp tcp_udp;
+	};
+};
+
+static int parse_trans_rule(struct mlx4_dev *dev, struct mlx4_spec_list *spec,
+			    struct _rule_hw *rule_hw)
+{
+	static const u16 __sw_id_hw[] = {
+		[MLX4_NET_TRANS_RULE_ID_ETH]     = 0xE001,
+		[MLX4_NET_TRANS_RULE_ID_IB]      = 0xE005,
+		[MLX4_NET_TRANS_RULE_ID_IPV6]    = 0xE003,
+		[MLX4_NET_TRANS_RULE_ID_IPV4]    = 0xE002,
+		[MLX4_NET_TRANS_RULE_ID_TCP]     = 0xE004,
+		[MLX4_NET_TRANS_RULE_ID_UDP]     = 0xE006
+	};
+
+	static const size_t __rule_hw_sz[] = {
+		[MLX4_NET_TRANS_RULE_ID_ETH] =
+			sizeof(struct mlx4_net_trans_rule_hw_eth),
+		[MLX4_NET_TRANS_RULE_ID_IB] =
+			sizeof(struct mlx4_net_trans_rule_hw_ib),
+		[MLX4_NET_TRANS_RULE_ID_IPV6] = 0,
+		[MLX4_NET_TRANS_RULE_ID_IPV4] =
+			sizeof(struct mlx4_net_trans_rule_hw_ipv4),
+		[MLX4_NET_TRANS_RULE_ID_TCP] =
+			sizeof(struct mlx4_net_trans_rule_hw_tcp_udp),
+		[MLX4_NET_TRANS_RULE_ID_UDP] =
+			sizeof(struct mlx4_net_trans_rule_hw_tcp_udp)
+	};
+	if (spec->id > MLX4_NET_TRANS_RULE_NUM) {
+		mlx4_err(dev, "Invalid network rule id. id = %d\n", spec->id);
+		return -EINVAL;
+	}
+	memset(rule_hw, 0, __rule_hw_sz[spec->id]);
+	rule_hw->id = cpu_to_be16(__sw_id_hw[spec->id]);
+	rule_hw->size = __rule_hw_sz[spec->id] >> 2;
+
+	switch (spec->id) {
+	case MLX4_NET_TRANS_RULE_ID_ETH:
+		memcpy(rule_hw->eth.dst_mac, spec->eth.dst_mac, ETH_ALEN);
+		memcpy(rule_hw->eth.dst_mac_msk, spec->eth.dst_mac_msk,
+								ETH_ALEN);
+		memcpy(rule_hw->eth.src_mac, spec->eth.src_mac, ETH_ALEN);
+		memcpy(rule_hw->eth.src_mac_msk, spec->eth.src_mac_msk,
+								ETH_ALEN);
+		if (spec->eth.ether_type_enable) {
+			rule_hw->eth.ether_type_enable = 1;
+			rule_hw->eth.ether_type = spec->eth.ether_type;
+		}
+		rule_hw->eth.vlan_id = spec->eth.vlan_id;
+		rule_hw->eth.vlan_id_msk = spec->eth.vlan_id_msk;
+		break;
+
+	case MLX4_NET_TRANS_RULE_ID_IB:
+		rule_hw->ib.qpn = spec->ib.r_qpn;
+		rule_hw->ib.qpn_mask = spec->ib.qpn_msk;
+		memcpy(&rule_hw->ib.dst_gid, &spec->ib.dst_gid, 16);
+		memcpy(&rule_hw->ib.dst_gid_msk, &spec->ib.dst_gid_msk, 16);
+		break;
+
+	case MLX4_NET_TRANS_RULE_ID_IPV6:
+		return -EOPNOTSUPP;
+
+	case MLX4_NET_TRANS_RULE_ID_IPV4:
+		rule_hw->ipv4.src_ip = spec->ipv4.src_ip;
+		rule_hw->ipv4.src_ip_msk = spec->ipv4.src_ip_msk;
+		rule_hw->ipv4.dst_ip = spec->ipv4.dst_ip;
+		rule_hw->ipv4.dst_ip_msk = spec->ipv4.dst_ip_msk;
+		break;
+
+	case MLX4_NET_TRANS_RULE_ID_TCP:
+	case MLX4_NET_TRANS_RULE_ID_UDP:
+		rule_hw->tcp_udp.dst_port = spec->tcp_udp.dst_port;
+		rule_hw->tcp_udp.dst_port_msk = spec->tcp_udp.dst_port_msk;
+		rule_hw->tcp_udp.src_port = spec->tcp_udp.src_port;
+		rule_hw->tcp_udp.src_port_msk = spec->tcp_udp.src_port_msk;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	return __rule_hw_sz[spec->id];
+}
+
+static void mlx4_err_rule(struct mlx4_dev *dev, char *str,
+			  struct mlx4_net_trans_rule *rule)
+{
+#define BUF_SIZE 256
+	struct mlx4_spec_list *cur;
+	char buf[BUF_SIZE];
+	int len = 0;
+
+	mlx4_err(dev, "%s", str);
+	len += snprintf(buf + len, BUF_SIZE - len,
+			"port = %d prio = 0x%x qp = 0x%x ",
+			rule->port, rule->priority, rule->qpn);
+
+	list_for_each_entry(cur, &rule->list, list) {
+		switch (cur->id) {
+		case MLX4_NET_TRANS_RULE_ID_ETH:
+			len += snprintf(buf + len, BUF_SIZE - len,
+					"dmac = %pM ", &cur->eth.dst_mac);
+			if (cur->eth.ether_type)
+				len += snprintf(buf + len, BUF_SIZE - len,
+						"ethertype = 0x%x ",
+						be16_to_cpu(cur->eth.ether_type));
+			if (cur->eth.vlan_id)
+				len += snprintf(buf + len, BUF_SIZE - len,
+						"vlan-id = %d ",
+						be16_to_cpu(cur->eth.vlan_id));
+			break;
+
+		case MLX4_NET_TRANS_RULE_ID_IPV4:
+			if (cur->ipv4.src_ip)
+				len += snprintf(buf + len, BUF_SIZE - len,
+						"src-ip = %pI4 ",
+						&cur->ipv4.src_ip);
+			if (cur->ipv4.dst_ip)
+				len += snprintf(buf + len, BUF_SIZE - len,
+						"dst-ip = %pI4 ",
+						&cur->ipv4.dst_ip);
+			break;
+
+		case MLX4_NET_TRANS_RULE_ID_TCP:
+		case MLX4_NET_TRANS_RULE_ID_UDP:
+			if (cur->tcp_udp.src_port)
+				len += snprintf(buf + len, BUF_SIZE - len,
+						"src-port = %d ",
+						be16_to_cpu(cur->tcp_udp.src_port));
+			if (cur->tcp_udp.dst_port)
+				len += snprintf(buf + len, BUF_SIZE - len,
+						"dst-port = %d ",
+						be16_to_cpu(cur->tcp_udp.dst_port));
+			break;
+
+		case MLX4_NET_TRANS_RULE_ID_IB:
+			len += snprintf(buf + len, BUF_SIZE - len,
+					"dst-gid = %pI6\n", cur->ib.dst_gid);
+			len += snprintf(buf + len, BUF_SIZE - len,
+					"dst-gid-mask = %pI6\n",
+					cur->ib.dst_gid_msk);
+			break;
+
+		case MLX4_NET_TRANS_RULE_ID_IPV6:
+			break;
+
+		default:
+			break;
+		}
+	}
+	len += snprintf(buf + len, BUF_SIZE - len, "\n");
+	mlx4_err(dev, "%s", buf);
+
+	if (len >= BUF_SIZE)
+		mlx4_err(dev, "Network rule error message was truncated, print buffer is too small.\n");
+}
+
+int mlx4_flow_attach(struct mlx4_dev *dev,
+		     struct mlx4_net_trans_rule *rule, u64 *reg_id)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	struct mlx4_spec_list *cur;
+	u32 size = 0;
+	int ret;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memset(mailbox->buf, 0, sizeof(struct mlx4_net_trans_rule_hw_ctrl));
+	trans_rule_ctrl_to_hw(rule, mailbox->buf);
+
+	size += sizeof(struct mlx4_net_trans_rule_hw_ctrl);
+
+	list_for_each_entry(cur, &rule->list, list) {
+		ret = parse_trans_rule(dev, cur, mailbox->buf + size);
+		if (ret < 0) {
+			mlx4_free_cmd_mailbox(dev, mailbox);
+			return -EINVAL;
+		}
+		size += ret;
+	}
+
+	ret = mlx4_QP_FLOW_STEERING_ATTACH(dev, mailbox, size >> 2, reg_id);
+	if (ret == -ENOMEM)
+		mlx4_err_rule(dev,
+			      "mcg table is full. Fail to register network rule.\n",
+			      rule);
+	else if (ret)
+		mlx4_err_rule(dev, "Fail to register network rule.\n", rule);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mlx4_flow_attach);
+
+int mlx4_flow_detach(struct mlx4_dev *dev, u64 reg_id)
+{
+	int err;
+
+	err = mlx4_QP_FLOW_STEERING_DETACH(dev, reg_id);
+	if (err)
+		mlx4_err(dev, "Fail to detach network rule. registration id = 0x%llx\n"
+								, reg_id);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_flow_detach);
+
 int mlx4_qp_attach_common(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 			  int block_mcast_loopback, enum mlx4_protocol prot,
 			  enum mlx4_steer_type steer)
@@ -895,7 +1206,8 @@ static int mlx4_QP_ATTACH(struct mlx4_dev *dev, struct mlx4_qp *qp,
 }
 
 int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
-			  int block_mcast_loopback, enum mlx4_protocol prot)
+			  u8 port, int block_mcast_loopback,
+			  enum mlx4_protocol prot, u64 *reg_id)
 {
 
 	switch (dev->caps.steering_mode) {
@@ -914,6 +1226,42 @@ int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 					     block_mcast_loopback, prot,
 					     MLX4_MC_STEER);
 
+	case MLX4_STEERING_MODE_DEVICE_MANAGED: {
+		struct mlx4_spec_list spec = { {NULL} };
+		__be64 mac_mask = cpu_to_be64(MLX4_MAC_MASK << 16);
+
+		struct mlx4_net_trans_rule rule = {
+			.queue_mode = MLX4_NET_TRANS_Q_FIFO,
+			.exclusive = 0,
+			.promisc_mode = MLX4_FS_PROMISC_NONE,
+			.priority = MLX4_DOMAIN_NIC,
+		};
+
+		rule.allow_loopback = ~block_mcast_loopback;
+		rule.port = port;
+		rule.qpn = qp->qpn;
+		INIT_LIST_HEAD(&rule.list);
+
+		switch (prot) {
+		case MLX4_PROT_ETH:
+			spec.id = MLX4_NET_TRANS_RULE_ID_ETH;
+			memcpy(spec.eth.dst_mac, &gid[10], ETH_ALEN);
+			memcpy(spec.eth.dst_mac_msk, &mac_mask, ETH_ALEN);
+			break;
+
+		case MLX4_PROT_IB_IPV6:
+			spec.id = MLX4_NET_TRANS_RULE_ID_IB;
+			memcpy(spec.ib.dst_gid, gid, 16);
+			memset(&spec.ib.dst_gid_msk, 0xff, 16);
+			break;
+		default:
+			return -EINVAL;
+		}
+		list_add_tail(&spec.list, &rule.list);
+
+		return  mlx4_flow_attach(dev, &rule, reg_id);
+	}
+
 	default:
 		return -EINVAL;
 	}
@@ -921,7 +1269,7 @@ int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 EXPORT_SYMBOL_GPL(mlx4_multicast_attach);
 
 int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
-			  enum mlx4_protocol prot)
+			  enum mlx4_protocol prot, u64 reg_id)
 {
 	switch (dev->caps.steering_mode) {
 	case MLX4_STEERING_MODE_A0:
@@ -938,6 +1286,9 @@ int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 		return mlx4_qp_detach_common(dev, qp, gid, prot,
 					     MLX4_MC_STEER);
 
+	case MLX4_STEERING_MODE_DEVICE_MANAGED:
+		return mlx4_flow_detach(dev, reg_id);
+
 	default:
 		return -EINVAL;
 	}
@@ -1042,6 +1393,10 @@ int mlx4_init_mcg_table(struct mlx4_dev *dev)
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	int err;
 
+	/* No need for mcg_table when fw managed the mcg table*/
+	if (dev->caps.steering_mode ==
+	    MLX4_STEERING_MODE_DEVICE_MANAGED)
+		return 0;
 	err = mlx4_bitmap_init(&priv->mcg_table.bitmap, dev->caps.num_amgms,
 			       dev->caps.num_amgms - 1, 0, 0);
 	if (err)
@@ -1054,5 +1409,7 @@ int mlx4_init_mcg_table(struct mlx4_dev *dev)
 
 void mlx4_cleanup_mcg_table(struct mlx4_dev *dev)
 {
-	mlx4_bitmap_cleanup(&mlx4_priv(dev)->mcg_table.bitmap);
+	if (dev->caps.steering_mode !=
+	    MLX4_STEERING_MODE_DEVICE_MANAGED)
+		mlx4_bitmap_cleanup(&mlx4_priv(dev)->mcg_table.bitmap);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index c07e882..0084967 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -54,6 +54,17 @@
 #define DRV_VERSION	"1.1"
 #define DRV_RELDATE	"Dec, 2011"
 
+#define MLX4_FS_UDP_UC_EN		(1 << 1)
+#define MLX4_FS_TCP_UC_EN		(1 << 2)
+#define MLX4_FS_NUM_OF_L2_ADDR		8
+#define MLX4_FS_MGM_LOG_ENTRY_SIZE	7
+#define MLX4_FS_NUM_MCG			(1 << 17)
+
+enum {
+	MLX4_FS_L2_HASH = 0,
+	MLX4_FS_L2_L3_L4_HASH,
+};
+
 #define MLX4_NUM_UP		8
 #define MLX4_NUM_TC		8
 #define MLX4_RATELIMIT_UNITS 3 /* 100 Mbps */
@@ -704,6 +715,7 @@ struct mlx4_set_port_rqp_calc_context {
 
 struct mlx4_mac_entry {
 	u64 mac;
+	u64 reg_id;
 };
 
 struct mlx4_port_info {
@@ -777,6 +789,7 @@ struct mlx4_priv {
 	struct mutex		bf_mutex;
 	struct io_mapping	*bf_mapping;
 	int			reserved_mtts;
+	int			fs_hash_mode;
 };
 
 static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 1bb00cd..2d6dabe 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -414,6 +414,7 @@ struct mlx4_en_mc_list {
 	struct list_head	list;
 	enum mlx4_en_mclist_act	action;
 	u8			addr[ETH_ALEN];
+	u64			reg_id;
 };
 
 struct mlx4_en_frag_info {
@@ -503,6 +504,7 @@ struct mlx4_en_priv {
 	u64 stats_bitmap;
 	struct list_head mc_list;
 	struct list_head curr_list;
+	u64 broadcast_id;
 	struct mlx4_en_stat_out_mbox hw_stats;
 	int vids[128];
 	bool wol;
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 58de723..a51d1b9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -75,21 +75,54 @@ void mlx4_init_vlan_table(struct mlx4_dev *dev, struct mlx4_vlan_table *table)
 	table->total = 0;
 }
 
-static int mlx4_uc_steer_add(struct mlx4_dev *dev, u8 port, u64 mac, int *qpn)
+static int mlx4_uc_steer_add(struct mlx4_dev *dev, u8 port,
+			     u64 mac, int *qpn, u64 *reg_id)
 {
-	struct mlx4_qp qp;
-	u8 gid[16] = {0};
 	__be64 be_mac;
 	int err;
 
-	qp.qpn = *qpn;
-
-	mac &= 0xffffffffffffULL;
+	mac &= MLX4_MAC_MASK;
 	be_mac = cpu_to_be64(mac << 16);
-	memcpy(&gid[10], &be_mac, ETH_ALEN);
-	gid[5] = port;
 
-	err = mlx4_unicast_attach(dev, &qp, gid, 0, MLX4_PROT_ETH);
+	switch (dev->caps.steering_mode) {
+	case MLX4_STEERING_MODE_B0: {
+		struct mlx4_qp qp;
+		u8 gid[16] = {0};
+
+		qp.qpn = *qpn;
+		memcpy(&gid[10], &be_mac, ETH_ALEN);
+		gid[5] = port;
+
+		err = mlx4_unicast_attach(dev, &qp, gid, 0, MLX4_PROT_ETH);
+		break;
+	}
+	case MLX4_STEERING_MODE_DEVICE_MANAGED: {
+		struct mlx4_spec_list spec_eth = { {NULL} };
+		__be64 mac_mask = cpu_to_be64(MLX4_MAC_MASK << 16);
+
+		struct mlx4_net_trans_rule rule = {
+			.queue_mode = MLX4_NET_TRANS_Q_FIFO,
+			.exclusive = 0,
+			.allow_loopback = 1,
+			.promisc_mode = MLX4_FS_PROMISC_NONE,
+			.priority = MLX4_DOMAIN_NIC,
+		};
+
+		rule.port = port;
+		rule.qpn = *qpn;
+		INIT_LIST_HEAD(&rule.list);
+
+		spec_eth.id = MLX4_NET_TRANS_RULE_ID_ETH;
+		memcpy(spec_eth.eth.dst_mac, &be_mac, ETH_ALEN);
+		memcpy(spec_eth.eth.dst_mac_msk, &mac_mask, ETH_ALEN);
+		list_add_tail(&spec_eth.list, &rule.list);
+
+		err = mlx4_flow_attach(dev, &rule, reg_id);
+		break;
+	}
+	default:
+		return -EINVAL;
+	}
 	if (err)
 		mlx4_warn(dev, "Failed Attaching Unicast\n");
 
@@ -97,19 +130,30 @@ static int mlx4_uc_steer_add(struct mlx4_dev *dev, u8 port, u64 mac, int *qpn)
 }
 
 static void mlx4_uc_steer_release(struct mlx4_dev *dev, u8 port,
-				  u64 mac, int qpn)
+				  u64 mac, int qpn, u64 reg_id)
 {
-	struct mlx4_qp qp;
-	u8 gid[16] = {0};
-	__be64 be_mac;
+	switch (dev->caps.steering_mode) {
+	case MLX4_STEERING_MODE_B0: {
+		struct mlx4_qp qp;
+		u8 gid[16] = {0};
+		__be64 be_mac;
 
-	qp.qpn = qpn;
-	mac &= 0xffffffffffffULL;
-	be_mac = cpu_to_be64(mac << 16);
-	memcpy(&gid[10], &be_mac, ETH_ALEN);
-	gid[5] = port;
+		qp.qpn = qpn;
+		mac &= MLX4_MAC_MASK;
+		be_mac = cpu_to_be64(mac << 16);
+		memcpy(&gid[10], &be_mac, ETH_ALEN);
+		gid[5] = port;
 
-	mlx4_unicast_detach(dev, &qp, gid, MLX4_PROT_ETH);
+		mlx4_unicast_detach(dev, &qp, gid, MLX4_PROT_ETH);
+		break;
+	}
+	case MLX4_STEERING_MODE_DEVICE_MANAGED: {
+		mlx4_flow_detach(dev, reg_id);
+		break;
+	}
+	default:
+		mlx4_err(dev, "Invalid steering mode.\n");
+	}
 }
 
 static int validate_index(struct mlx4_dev *dev,
@@ -144,6 +188,7 @@ int mlx4_get_eth_qp(struct mlx4_dev *dev, u8 port, u64 mac, int *qpn)
 	struct mlx4_mac_entry *entry;
 	int index = 0;
 	int err = 0;
+	u64 reg_id;
 
 	mlx4_dbg(dev, "Registering MAC: 0x%llx for adding\n",
 			(unsigned long long) mac);
@@ -167,7 +212,7 @@ int mlx4_get_eth_qp(struct mlx4_dev *dev, u8 port, u64 mac, int *qpn)
 		goto qp_err;
 	}
 
-	err = mlx4_uc_steer_add(dev, port, mac, qpn);
+	err = mlx4_uc_steer_add(dev, port, mac, qpn, &reg_id);
 	if (err)
 		goto steer_err;
 
@@ -177,6 +222,7 @@ int mlx4_get_eth_qp(struct mlx4_dev *dev, u8 port, u64 mac, int *qpn)
 		goto alloc_err;
 	}
 	entry->mac = mac;
+	entry->reg_id = reg_id;
 	err = radix_tree_insert(&info->mac_tree, *qpn, entry);
 	if (err)
 		goto insert_err;
@@ -186,7 +232,7 @@ insert_err:
 	kfree(entry);
 
 alloc_err:
-	mlx4_uc_steer_release(dev, port, mac, *qpn);
+	mlx4_uc_steer_release(dev, port, mac, *qpn, reg_id);
 
 steer_err:
 	mlx4_qp_release_range(dev, *qpn, 1);
@@ -212,7 +258,8 @@ void mlx4_put_eth_qp(struct mlx4_dev *dev, u8 port, u64 mac, int qpn)
 			mlx4_dbg(dev, "Releasing qp: port %d, mac 0x%llx,"
 				 " qpn %d\n", port,
 				 (unsigned long long) mac, qpn);
-			mlx4_uc_steer_release(dev, port, entry->mac, qpn);
+			mlx4_uc_steer_release(dev, port, entry->mac,
+					      qpn, entry->reg_id);
 			mlx4_qp_release_range(dev, qpn, 1);
 			radix_tree_delete(&info->mac_tree, qpn);
 			kfree(entry);
@@ -363,11 +410,14 @@ int mlx4_replace_mac(struct mlx4_dev *dev, u8 port, int qpn, u64 new_mac)
 		entry = radix_tree_lookup(&info->mac_tree, qpn);
 		if (!entry)
 			return -EINVAL;
-		mlx4_uc_steer_release(dev, port, entry->mac, qpn);
+		mlx4_uc_steer_release(dev, port, entry->mac,
+				      qpn, entry->reg_id);
 		mlx4_unregister_mac(dev, port, entry->mac);
 		entry->mac = new_mac;
+		entry->reg_id = 0;
 		mlx4_register_mac(dev, port, new_mac);
-		err = mlx4_uc_steer_add(dev, port, entry->mac, &qpn);
+		err = mlx4_uc_steer_add(dev, port, entry->mac,
+					&qpn, &entry->reg_id);
 		return err;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/profile.c b/drivers/net/ethernet/mellanox/mlx4/profile.c
index b83bc92..9ee4725 100644
--- a/drivers/net/ethernet/mellanox/mlx4/profile.c
+++ b/drivers/net/ethernet/mellanox/mlx4/profile.c
@@ -237,13 +237,19 @@ u64 mlx4_make_profile(struct mlx4_dev *dev,
 			init_hca->mtt_base	 = profile[i].start;
 			break;
 		case MLX4_RES_MCG:
-			dev->caps.num_mgms	  = profile[i].num >> 1;
-			dev->caps.num_amgms	  = profile[i].num >> 1;
 			init_hca->mc_base	  = profile[i].start;
 			init_hca->log_mc_entry_sz =
 					ilog2(mlx4_get_mgm_entry_size(dev));
 			init_hca->log_mc_table_sz = profile[i].log_num;
-			init_hca->log_mc_hash_sz  = profile[i].log_num - 1;
+			if (dev->caps.steering_mode ==
+			    MLX4_STEERING_MODE_DEVICE_MANAGED) {
+				dev->caps.num_mgms = profile[i].num;
+			} else {
+				init_hca->log_mc_hash_sz =
+						profile[i].log_num - 1;
+				dev->caps.num_mgms = profile[i].num >> 1;
+				dev->caps.num_amgms = profile[i].num >> 1;
+			}
 			break;
 		default:
 			break;
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 4301aa5..ce73772 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -2746,6 +2746,9 @@ int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev *dev, int slave,
 					 struct mlx4_cmd_mailbox *outbox,
 					 struct mlx4_cmd_info *cmd)
 {
+	if (dev->caps.steering_mode !=
+	    MLX4_STEERING_MODE_DEVICE_MANAGED)
+		return -EOPNOTSUPP;
 	return mlx4_cmd_imm(dev, inbox->dma, &vhcr->out_param,
 			    vhcr->in_modifier, 0,
 			    MLX4_QP_FLOW_STEERING_ATTACH,
@@ -2759,6 +2762,9 @@ int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
 					 struct mlx4_cmd_mailbox *outbox,
 					 struct mlx4_cmd_info *cmd)
 {
+	if (dev->caps.steering_mode !=
+	    MLX4_STEERING_MODE_DEVICE_MANAGED)
+		return -EOPNOTSUPP;
 	return mlx4_cmd(dev, vhcr->in_param, 0, 0,
 			MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A,
 			MLX4_CMD_NATIVE);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 80891a4..a22a851 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -70,14 +70,17 @@ enum {
 	MLX4_MFUNC_EQE_MASK     = (MLX4_MFUNC_MAX_EQES - 1)
 };
 
-/* Driver supports 2 diffrent device methods to manage traffic steering:
+/* Driver supports 3 diffrent device methods to manage traffic steering:
+	-device managed - High level API for ib and eth flow steering. FW is
+			  managing flow steering tables.
 	- B0 steering mode - Common low level API for ib and (if supported) eth.
 	- A0 steering mode - Limited low level API for eth. In case of IB,
 			     B0 mode is in use.
  */
 enum {
 	MLX4_STEERING_MODE_A0,
-	MLX4_STEERING_MODE_B0
+	MLX4_STEERING_MODE_B0,
+	MLX4_STEERING_MODE_DEVICE_MANAGED
 };
 
 static inline const char *mlx4_steering_mode_str(int steering_mode)
@@ -88,6 +91,10 @@ static inline const char *mlx4_steering_mode_str(int steering_mode)
 
 	case MLX4_STEERING_MODE_B0:
 		return "B0 steering";
+
+	case MLX4_STEERING_MODE_DEVICE_MANAGED:
+		return "Device managed flow steering";
+
 	default:
 		return "Unrecognize steering mode";
 	}
@@ -125,7 +132,8 @@ enum {
 enum {
 	MLX4_DEV_CAP_FLAG2_RSS			= 1LL <<  0,
 	MLX4_DEV_CAP_FLAG2_RSS_TOP		= 1LL <<  1,
-	MLX4_DEV_CAP_FLAG2_RSS_XOR		= 1LL <<  2
+	MLX4_DEV_CAP_FLAG2_RSS_XOR		= 1LL <<  2,
+	MLX4_DEV_CAP_FLAG2_FS_EN		= 1LL <<  3
 };
 
 #define MLX4_ATTR_EXTENDED_PORT_INFO	cpu_to_be16(0xff90)
@@ -319,6 +327,7 @@ struct mlx4_caps {
 	int			reserved_mcgs;
 	int			num_qp_per_mgm;
 	int			steering_mode;
+	int			fs_log_max_ucast_qp_range_size;
 	int			num_pds;
 	int			reserved_pds;
 	int			max_xrcds;
@@ -647,9 +656,94 @@ int mlx4_unicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 int mlx4_unicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 			enum mlx4_protocol prot);
 int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
-			  int block_mcast_loopback, enum mlx4_protocol protocol);
+			  u8 port, int block_mcast_loopback,
+			  enum mlx4_protocol protocol, u64 *reg_id);
 int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
-			  enum mlx4_protocol protocol);
+			  enum mlx4_protocol protocol, u64 reg_id);
+
+enum {
+	MLX4_DOMAIN_UVERBS	= 0x1000,
+	MLX4_DOMAIN_ETHTOOL     = 0x2000,
+	MLX4_DOMAIN_RFS         = 0x3000,
+	MLX4_DOMAIN_NIC    = 0x5000,
+};
+
+enum mlx4_net_trans_rule_id {
+	MLX4_NET_TRANS_RULE_ID_ETH = 0,
+	MLX4_NET_TRANS_RULE_ID_IB,
+	MLX4_NET_TRANS_RULE_ID_IPV6,
+	MLX4_NET_TRANS_RULE_ID_IPV4,
+	MLX4_NET_TRANS_RULE_ID_TCP,
+	MLX4_NET_TRANS_RULE_ID_UDP,
+	MLX4_NET_TRANS_RULE_NUM, /* should be last */
+};
+
+enum mlx4_net_trans_promisc_mode {
+	MLX4_FS_PROMISC_NONE = 0,
+	MLX4_FS_PROMISC_UPLINK,
+	MLX4_FS_PROMISC_FUNCTION_PORT,
+	MLX4_FS_PROMISC_ALL_MULTI,
+};
+
+struct mlx4_spec_eth {
+	u8	dst_mac[6];
+	u8	dst_mac_msk[6];
+	u8	src_mac[6];
+	u8	src_mac_msk[6];
+	u8	ether_type_enable;
+	__be16	ether_type;
+	__be16	vlan_id_msk;
+	__be16	vlan_id;
+};
+
+struct mlx4_spec_tcp_udp {
+	__be16 dst_port;
+	__be16 dst_port_msk;
+	__be16 src_port;
+	__be16 src_port_msk;
+};
+
+struct mlx4_spec_ipv4 {
+	__be32 dst_ip;
+	__be32 dst_ip_msk;
+	__be32 src_ip;
+	__be32 src_ip_msk;
+};
+
+struct mlx4_spec_ib {
+	__be32	r_qpn;
+	__be32	qpn_msk;
+	u8	dst_gid[16];
+	u8	dst_gid_msk[16];
+};
+
+struct mlx4_spec_list {
+	struct	list_head list;
+	enum	mlx4_net_trans_rule_id id;
+	union {
+		struct mlx4_spec_eth eth;
+		struct mlx4_spec_ib ib;
+		struct mlx4_spec_ipv4 ipv4;
+		struct mlx4_spec_tcp_udp tcp_udp;
+	};
+};
+
+enum mlx4_net_trans_hw_rule_queue {
+	MLX4_NET_TRANS_Q_FIFO,
+	MLX4_NET_TRANS_Q_LIFO,
+};
+
+struct mlx4_net_trans_rule {
+	struct	list_head list;
+	enum	mlx4_net_trans_hw_rule_queue queue_mode;
+	bool	exclusive;
+	bool	allow_loopback;
+	enum	mlx4_net_trans_promisc_mode promisc_mode;
+	u8	port;
+	u16	priority;
+	u32	qpn;
+};
+
 int mlx4_multicast_promisc_add(struct mlx4_dev *dev, u32 qpn, u8 port);
 int mlx4_multicast_promisc_remove(struct mlx4_dev *dev, u32 qpn, u8 port);
 int mlx4_unicast_promisc_add(struct mlx4_dev *dev, u32 qpn, u8 port);
@@ -692,4 +786,8 @@ int mlx4_wol_write(struct mlx4_dev *dev, u64 config, int port);
 int mlx4_counter_alloc(struct mlx4_dev *dev, u32 *idx);
 void mlx4_counter_free(struct mlx4_dev *dev, u32 idx);
 
+int mlx4_flow_attach(struct mlx4_dev *dev,
+		     struct mlx4_net_trans_rule *rule, u64 *reg_id);
+int mlx4_flow_detach(struct mlx4_dev *dev, u64 reg_id);
+
 #endif /* MLX4_DEVICE_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 07/10] net/mlx4_core: Add resource tracking for device managed flow steering rules
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
                   ` (5 preceding siblings ...)
  2012-07-01  9:43 ` [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01 10:22   ` David Miller
  2012-07-01  9:43 ` [PATCH net-next 08/10] net/mlx4: Implement promiscuous mode with device managed flow-steering Or Gerlitz
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

As with other device resources, the resource tracker is needed for supporting
device managed flow steering rules under SRIOV: make sure virtual functions
delete only rules created by them, and clean all rules attached by a crashed VF.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    1 +
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |  132 ++++++++++++++++++--
 2 files changed, 125 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 0084967..d2c436b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -149,6 +149,7 @@ enum mlx4_resource {
 	RES_VLAN,
 	RES_EQ,
 	RES_COUNTER,
+	RES_FS_RULE,
 	MLX4_NUM_OF_RESOURCE_TYPE
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index ce73772..b4480de 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -190,6 +190,15 @@ struct res_xrcdn {
 	int			port;
 };
 
+enum res_fs_rule_states {
+	RES_FS_RULE_BUSY = RES_ANY_BUSY,
+	RES_FS_RULE_ALLOCATED,
+};
+
+struct res_fs_rule {
+	struct res_common	com;
+};
+
 static struct res_common *res_tracker_lookup(struct rb_root *root, u64 res_id)
 {
 	struct rb_node *node = root->rb_node;
@@ -245,6 +254,7 @@ static const char *ResourceType(enum mlx4_resource rt)
 	case RES_MAC: return  "RES_MAC";
 	case RES_EQ: return "RES_EQ";
 	case RES_COUNTER: return "RES_COUNTER";
+	case RES_FS_RULE: return "RES_FS_RULE";
 	case RES_XRCD: return "RES_XRCD";
 	default: return "Unknown resource type !!!";
 	};
@@ -516,6 +526,20 @@ static struct res_common *alloc_xrcdn_tr(int id)
 	return &ret->com;
 }
 
+static struct res_common *alloc_fs_rule_tr(u64 id)
+{
+	struct res_fs_rule *ret;
+
+	ret = kzalloc(sizeof *ret, GFP_KERNEL);
+	if (!ret)
+		return NULL;
+
+	ret->com.res_id = id;
+	ret->com.state = RES_FS_RULE_ALLOCATED;
+
+	return &ret->com;
+}
+
 static struct res_common *alloc_tr(u64 id, enum mlx4_resource type, int slave,
 				   int extra)
 {
@@ -549,6 +573,9 @@ static struct res_common *alloc_tr(u64 id, enum mlx4_resource type, int slave,
 	case RES_XRCD:
 		ret = alloc_xrcdn_tr(id);
 		break;
+	case RES_FS_RULE:
+		ret = alloc_fs_rule_tr(id);
+		break;
 	default:
 		return NULL;
 	}
@@ -681,6 +708,16 @@ static int remove_xrcdn_ok(struct res_xrcdn *res)
 	return 0;
 }
 
+static int remove_fs_rule_ok(struct res_fs_rule *res)
+{
+	if (res->com.state == RES_FS_RULE_BUSY)
+		return -EBUSY;
+	else if (res->com.state != RES_FS_RULE_ALLOCATED)
+		return -EPERM;
+
+	return 0;
+}
+
 static int remove_cq_ok(struct res_cq *res)
 {
 	if (res->com.state == RES_CQ_BUSY)
@@ -722,6 +759,8 @@ static int remove_ok(struct res_common *res, enum mlx4_resource type, int extra)
 		return remove_counter_ok((struct res_counter *)res);
 	case RES_XRCD:
 		return remove_xrcdn_ok((struct res_xrcdn *)res);
+	case RES_FS_RULE:
+		return remove_fs_rule_ok((struct res_fs_rule *)res);
 	default:
 		return -EINVAL;
 	}
@@ -2746,14 +2785,28 @@ int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev *dev, int slave,
 					 struct mlx4_cmd_mailbox *outbox,
 					 struct mlx4_cmd_info *cmd)
 {
+	int err;
+
 	if (dev->caps.steering_mode !=
 	    MLX4_STEERING_MODE_DEVICE_MANAGED)
 		return -EOPNOTSUPP;
-	return mlx4_cmd_imm(dev, inbox->dma, &vhcr->out_param,
-			    vhcr->in_modifier, 0,
-			    MLX4_QP_FLOW_STEERING_ATTACH,
-			    MLX4_CMD_TIME_CLASS_A,
-			    MLX4_CMD_NATIVE);
+
+	err = mlx4_cmd_imm(dev, inbox->dma, &vhcr->out_param,
+			   vhcr->in_modifier, 0,
+			   MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A,
+			   MLX4_CMD_NATIVE);
+	if (err)
+		return err;
+
+	err = add_res_range(dev, slave, vhcr->out_param, 1, RES_FS_RULE, 0);
+	if (err) {
+		mlx4_err(dev, "Fail to add flow steering resources.\n ");
+		/* detach rule*/
+		mlx4_cmd(dev, vhcr->out_param, 0, 0,
+			 MLX4_QP_FLOW_STEERING_ATTACH, MLX4_CMD_TIME_CLASS_A,
+			 MLX4_CMD_NATIVE);
+	}
+	return err;
 }
 
 int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
@@ -2762,12 +2815,22 @@ int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
 					 struct mlx4_cmd_mailbox *outbox,
 					 struct mlx4_cmd_info *cmd)
 {
+	int err;
+
 	if (dev->caps.steering_mode !=
 	    MLX4_STEERING_MODE_DEVICE_MANAGED)
 		return -EOPNOTSUPP;
-	return mlx4_cmd(dev, vhcr->in_param, 0, 0,
-			MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A,
-			MLX4_CMD_NATIVE);
+
+	err = rem_res_range(dev, slave, vhcr->in_param, 1, RES_FS_RULE, 0);
+	if (err) {
+		mlx4_err(dev, "Fail to remove flow steering resources.\n ");
+		return err;
+	}
+
+	err = mlx4_cmd(dev, vhcr->in_param, 0, 0,
+		       MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A,
+		       MLX4_CMD_NATIVE);
+	return err;
 }
 
 enum {
@@ -3179,6 +3242,58 @@ static void rem_slave_mtts(struct mlx4_dev *dev, int slave)
 	spin_unlock_irq(mlx4_tlock(dev));
 }
 
+static void rem_slave_fs_rule(struct mlx4_dev *dev, int slave)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_resource_tracker *tracker =
+		&priv->mfunc.master.res_tracker;
+	struct list_head *fs_rule_list =
+		&tracker->slave_list[slave].res_list[RES_FS_RULE];
+	struct res_fs_rule *fs_rule;
+	struct res_fs_rule *tmp;
+	int state;
+	u64 base;
+	int err;
+
+	err = move_all_busy(dev, slave, RES_FS_RULE);
+	if (err)
+		mlx4_warn(dev, "rem_slave_fs_rule: Could not move all mtts to busy for slave %d\n",
+			  slave);
+
+	spin_lock_irq(mlx4_tlock(dev));
+	list_for_each_entry_safe(fs_rule, tmp, fs_rule_list, com.list) {
+		spin_unlock_irq(mlx4_tlock(dev));
+		if (fs_rule->com.owner == slave) {
+			base = fs_rule->com.res_id;
+			state = fs_rule->com.from_state;
+			while (state != 0) {
+				switch (state) {
+				case RES_FS_RULE_ALLOCATED:
+					/* detach rule */
+					err = mlx4_cmd(dev, base, 0, 0,
+						MLX4_QP_FLOW_STEERING_DETACH,
+						MLX4_CMD_TIME_CLASS_A,
+						MLX4_CMD_NATIVE);
+
+					spin_lock_irq(mlx4_tlock(dev));
+					rb_erase(&fs_rule->com.node,
+						 &tracker->res_tree[RES_FS_RULE]);
+					list_del(&fs_rule->com.list);
+					spin_unlock_irq(mlx4_tlock(dev));
+					kfree(fs_rule);
+					state = 0;
+					break;
+
+				default:
+					state = 0;
+				}
+			}
+		}
+		spin_lock_irq(mlx4_tlock(dev));
+	}
+	spin_unlock_irq(mlx4_tlock(dev));
+}
+
 static void rem_slave_eqs(struct mlx4_dev *dev, int slave)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -3320,5 +3435,6 @@ void mlx4_delete_all_resources_for_slave(struct mlx4_dev *dev, int slave)
 	rem_slave_mtts(dev, slave);
 	rem_slave_counters(dev, slave);
 	rem_slave_xrcdns(dev, slave);
+	rem_slave_fs_rule(dev, slave);
 	mutex_unlock(&priv->mfunc.master.res_tracker.slave_list[slave].mutex);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 08/10] net/mlx4: Implement promiscuous mode with device managed flow-steering
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
                   ` (6 preceding siblings ...)
  2012-07-01  9:43 ` [PATCH net-next 07/10] net/mlx4_core: Add resource tracking for device managed flow steering rules Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01  9:43 ` [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool Or Gerlitz
  2012-07-01  9:43 ` [PATCH net-next 10/10] net/mlx4_en: Add support for drop action through ethtool Or Gerlitz
  9 siblings, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

The device managed flow steering API has three promiscuous modes:

1. Uplink - captures all the packets that arrive to the port.
2. Allmulti - captures all multicast packets arriving to the port.
3. Function port - for future use, this mode is not implemented yet.

Use these modes with the flow_attach and flow_detach firmware commands
according to the promiscuous state of the netdevice.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   41 ++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mcg.c       |   60 ++++++++++++++++++++++++
 include/linux/mlx4/device.h                    |    8 +++
 3 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 62f0601..46b58d8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -299,6 +299,16 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 
 			/* Enable promiscouos mode */
 			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_DEVICE_MANAGED:
+				err = mlx4_flow_steer_promisc_add(mdev->dev,
+								  priv->port,
+								  priv->base_qpn,
+								  MLX4_FS_PROMISC_UPLINK);
+				if (err)
+					en_err(priv, "Failed enabling promiscuous mode\n");
+				priv->flags |= MLX4_EN_FLAG_MC_PROMISC;
+				break;
+
 			case MLX4_STEERING_MODE_B0:
 				err = mlx4_unicast_promisc_add(mdev->dev,
 							       priv->base_qpn,
@@ -354,6 +364,15 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 
 		/* Disable promiscouos mode */
 		switch (mdev->dev->caps.steering_mode) {
+		case MLX4_STEERING_MODE_DEVICE_MANAGED:
+			err = mlx4_flow_steer_promisc_remove(mdev->dev,
+							     priv->port,
+							     MLX4_FS_PROMISC_UPLINK);
+			if (err)
+				en_err(priv, "Failed disabling promiscuous mode\n");
+			priv->flags &= ~MLX4_EN_FLAG_MC_PROMISC;
+			break;
+
 		case MLX4_STEERING_MODE_B0:
 			err = mlx4_unicast_promisc_remove(mdev->dev,
 							  priv->base_qpn,
@@ -396,6 +415,13 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 		/* Add the default qp number as multicast promisc */
 		if (!(priv->flags & MLX4_EN_FLAG_MC_PROMISC)) {
 			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_DEVICE_MANAGED:
+				err = mlx4_flow_steer_promisc_add(mdev->dev,
+								  priv->port,
+								  priv->base_qpn,
+								  MLX4_FS_PROMISC_ALL_MULTI);
+				break;
+
 			case MLX4_STEERING_MODE_B0:
 				err = mlx4_multicast_promisc_add(mdev->dev,
 								 priv->base_qpn,
@@ -413,6 +439,12 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
 		/* Disable Multicast promisc */
 		if (priv->flags & MLX4_EN_FLAG_MC_PROMISC) {
 			switch (mdev->dev->caps.steering_mode) {
+			case MLX4_STEERING_MODE_DEVICE_MANAGED:
+				err = mlx4_flow_steer_promisc_remove(mdev->dev,
+								     priv->port,
+								     MLX4_FS_PROMISC_ALL_MULTI);
+				break;
+
 			case MLX4_STEERING_MODE_B0:
 				err = mlx4_multicast_promisc_remove(mdev->dev,
 								    priv->base_qpn,
@@ -836,6 +868,15 @@ int mlx4_en_start_port(struct net_device *dev)
 
 	/* Must redo promiscuous mode setup. */
 	priv->flags &= ~(MLX4_EN_FLAG_PROMISC | MLX4_EN_FLAG_MC_PROMISC);
+	if (mdev->dev->caps.steering_mode ==
+	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
+		mlx4_flow_steer_promisc_remove(mdev->dev,
+						priv->port,
+						MLX4_FS_PROMISC_UPLINK);
+		mlx4_flow_steer_promisc_remove(mdev->dev,
+						priv->port,
+						MLX4_FS_PROMISC_ALL_MULTI);
+	}
 
 	/* Schedule multicast task to populate multicast list */
 	queue_work(mdev->workqueue, &priv->mcast_task);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index c567655..c0c9751 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -1295,6 +1295,66 @@ int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16],
 }
 EXPORT_SYMBOL_GPL(mlx4_multicast_detach);
 
+int mlx4_flow_steer_promisc_add(struct mlx4_dev *dev, u8 port,
+				u32 qpn, enum mlx4_net_trans_promisc_mode mode)
+{
+	struct mlx4_net_trans_rule rule;
+	u64 *regid_p;
+
+	switch (mode) {
+	case MLX4_FS_PROMISC_UPLINK:
+	case MLX4_FS_PROMISC_FUNCTION_PORT:
+		regid_p = &dev->regid_promisc_array[port];
+		break;
+	case MLX4_FS_PROMISC_ALL_MULTI:
+		regid_p = &dev->regid_allmulti_array[port];
+		break;
+	default:
+		return -1;
+	}
+
+	if (*regid_p != 0)
+		return -1;
+
+	rule.promisc_mode = mode;
+	rule.port = port;
+	rule.qpn = qpn;
+	INIT_LIST_HEAD(&rule.list);
+	mlx4_err(dev, "going promisc on %x\n", port);
+
+	return  mlx4_flow_attach(dev, &rule, regid_p);
+}
+EXPORT_SYMBOL_GPL(mlx4_flow_steer_promisc_add);
+
+int mlx4_flow_steer_promisc_remove(struct mlx4_dev *dev, u8 port,
+				   enum mlx4_net_trans_promisc_mode mode)
+{
+	int ret;
+	u64 *regid_p;
+
+	switch (mode) {
+	case MLX4_FS_PROMISC_UPLINK:
+	case MLX4_FS_PROMISC_FUNCTION_PORT:
+		regid_p = &dev->regid_promisc_array[port];
+		break;
+	case MLX4_FS_PROMISC_ALL_MULTI:
+		regid_p = &dev->regid_allmulti_array[port];
+		break;
+	default:
+		return -1;
+	}
+
+	if (*regid_p == 0)
+		return -1;
+
+	ret =  mlx4_flow_detach(dev, *regid_p);
+	if (ret == 0)
+		*regid_p = 0;
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mlx4_flow_steer_promisc_remove);
+
 int mlx4_unicast_attach(struct mlx4_dev *dev,
 			struct mlx4_qp *qp, u8 gid[16],
 			int block_mcast_loopback, enum mlx4_protocol prot)
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index a22a851..692f9db 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -542,6 +542,8 @@ struct mlx4_dev {
 	u8			rev_id;
 	char			board_id[MLX4_BOARD_ID_LEN];
 	int			num_vfs;
+	u64 regid_promisc_array[MLX4_MAX_PORTS + 1];
+	u64 regid_allmulti_array[MLX4_MAX_PORTS + 1];
 };
 
 struct mlx4_init_port_param {
@@ -681,6 +683,7 @@ enum mlx4_net_trans_rule_id {
 enum mlx4_net_trans_promisc_mode {
 	MLX4_FS_PROMISC_NONE = 0,
 	MLX4_FS_PROMISC_UPLINK,
+	/* For future use. Not implemented yet*/
 	MLX4_FS_PROMISC_FUNCTION_PORT,
 	MLX4_FS_PROMISC_ALL_MULTI,
 };
@@ -744,6 +747,11 @@ struct mlx4_net_trans_rule {
 	u32	qpn;
 };
 
+int mlx4_flow_steer_promisc_add(struct mlx4_dev *dev, u8 port,
+				u32 qpn,
+				enum mlx4_net_trans_promisc_mode mode);
+int mlx4_flow_steer_promisc_remove(struct mlx4_dev *dev, u8 port,
+				   enum mlx4_net_trans_promisc_mode mode);
 int mlx4_multicast_promisc_add(struct mlx4_dev *dev, u32 qpn, u8 port);
 int mlx4_multicast_promisc_remove(struct mlx4_dev *dev, u32 qpn, u8 port);
 int mlx4_unicast_promisc_add(struct mlx4_dev *dev, u32 qpn, u8 port);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
                   ` (7 preceding siblings ...)
  2012-07-01  9:43 ` [PATCH net-next 08/10] net/mlx4: Implement promiscuous mode with device managed flow-steering Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  2012-07-01 10:23   ` David Miller
  2012-07-01 16:00   ` Ben Hutchings
  2012-07-01  9:43 ` [PATCH net-next 10/10] net/mlx4_en: Add support for drop action through ethtool Or Gerlitz
  9 siblings, 2 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

Implement the ethtool APIs for attaching L2/L3/L4 based flow steering
rules to the netdevice RX rings. Added set_rxnfc callback and enhanced
the existing get_rxnfc callback.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  373 +++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    7 +
 2 files changed, 380 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 72901ce..30de264 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -38,6 +38,10 @@
 #include "mlx4_en.h"
 #include "en_port.h"
 
+#define EN_ETHTOOL_QP_ATTACH (1ull << 63)
+#define EN_ETHTOOL_MAC_MASK 0xffffffffffffULL
+#define EN_ETHTOOL_SHORT_MASK cpu_to_be16(0xffff)
+#define EN_ETHTOOL_WORD_MASK  cpu_to_be32(0xffffffff)
 
 static void
 mlx4_en_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *drvinfo)
@@ -599,16 +603,360 @@ static int mlx4_en_set_rxfh_indir(struct net_device *dev,
 	return err;
 }
 
+#define not_all_zeros_or_all_ones(field, type) \
+	(field && (type)~field)
+
+static int mlx4_en_validate_flow(struct net_device *dev,
+				 struct ethtool_rxnfc *cmd)
+{
+	struct ethtool_usrip4_spec *l3_mask;
+	struct ethtool_tcpip4_spec *l4_mask;
+	struct ethhdr *eth_mask;
+	u64 full_mac = ~0ull;
+	u64 zero_mac = 0;
+
+	if (cmd->fs.location >= MAX_NUM_OF_FS_RULES)
+		return -EINVAL;
+
+	switch (cmd->fs.flow_type & ~FLOW_EXT) {
+	case TCP_V4_FLOW:
+	case UDP_V4_FLOW:
+		if (cmd->fs.h_u.tcp_ip4_spec.tos)
+			return -EOPNOTSUPP;
+		l4_mask = &cmd->fs.m_u.tcp_ip4_spec;
+		/* don't allow mask which isn't all 0 or 1 */
+		if (not_all_zeros_or_all_ones(l4_mask->ip4src, __be32) ||
+		    not_all_zeros_or_all_ones(l4_mask->ip4dst, __be32) ||
+		    not_all_zeros_or_all_ones(l4_mask->psrc, __be16) ||
+		    not_all_zeros_or_all_ones(l4_mask->pdst, __be16))
+			return -EOPNOTSUPP;
+		break;
+	case IP_USER_FLOW:
+		l3_mask = &cmd->fs.m_u.usr_ip4_spec;
+		if (cmd->fs.h_u.usr_ip4_spec.l4_4_bytes ||
+		    cmd->fs.h_u.usr_ip4_spec.tos ||
+		    cmd->fs.h_u.usr_ip4_spec.proto ||
+		    cmd->fs.h_u.usr_ip4_spec.ip_ver != ETH_RX_NFC_IP4 ||
+		    (!cmd->fs.h_u.usr_ip4_spec.ip4src &&
+		     !cmd->fs.h_u.usr_ip4_spec.ip4dst) ||
+		    not_all_zeros_or_all_ones(l3_mask->ip4src, __be32) ||
+		    not_all_zeros_or_all_ones(l3_mask->ip4dst, __be32))
+			return -EOPNOTSUPP;
+		break;
+	case ETHER_FLOW:
+		eth_mask = &cmd->fs.m_u.ether_spec;
+		if (memcmp(eth_mask->h_source, &zero_mac, ETH_ALEN))
+			return -EOPNOTSUPP;
+		if (!memcmp(eth_mask->h_dest, &zero_mac, ETH_ALEN))
+			return -EOPNOTSUPP;
+		if (not_all_zeros_or_all_ones(eth_mask->h_proto, __be16) ||
+		    (memcmp(eth_mask->h_dest, &zero_mac, ETH_ALEN) &&
+		     memcmp(eth_mask->h_dest, &full_mac, ETH_ALEN)))
+			return -EOPNOTSUPP;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	if ((cmd->fs.flow_type & FLOW_EXT)) {
+		if (cmd->fs.m_ext.vlan_etype ||
+		    not_all_zeros_or_all_ones(cmd->fs.m_ext.vlan_tci,
+					       __be16)) {
+			return -EOPNOTSUPP;
+		}
+	}
+
+	return 0;
+}
+
+static void add_ip_rule(struct mlx4_en_priv *priv,
+			struct ethtool_rxnfc *cmd,
+			struct list_head *list_h)
+{
+	struct mlx4_spec_list *spec_l3;
+
+	spec_l3 = kzalloc(sizeof *spec_l3, GFP_KERNEL);
+	if (!spec_l3) {
+		en_err(priv, "Fail to alloc ethtool rule.\n");
+		return;
+	}
+
+	spec_l3->id = MLX4_NET_TRANS_RULE_ID_IPV4;
+	spec_l3->ipv4.src_ip = cmd->fs.h_u.usr_ip4_spec.ip4src;
+	if (spec_l3->ipv4.src_ip)
+		spec_l3->ipv4.src_ip_msk = EN_ETHTOOL_WORD_MASK;
+	spec_l3->ipv4.dst_ip = cmd->fs.h_u.usr_ip4_spec.ip4dst;
+	if (spec_l3->ipv4.dst_ip)
+		spec_l3->ipv4.dst_ip_msk = EN_ETHTOOL_WORD_MASK;
+	list_add_tail(&spec_l3->list, list_h);
+}
+
+static void add_tcp_udp_rule(struct mlx4_en_priv *priv,
+			     struct ethtool_rxnfc *cmd,
+			     struct list_head *list_h, int proto)
+{
+	struct mlx4_spec_list *spec_l3;
+	struct mlx4_spec_list *spec_l4;
+
+	spec_l3 = kzalloc(sizeof *spec_l3, GFP_KERNEL);
+	spec_l4 = kzalloc(sizeof *spec_l4, GFP_KERNEL);
+	if (!spec_l4 || !spec_l3) {
+		en_err(priv, "Fail to alloc ethtool rule.\n");
+		return;
+	}
+
+	spec_l3->id = MLX4_NET_TRANS_RULE_ID_IPV4;
+
+	if (proto == TCP_V4_FLOW) {
+		spec_l4->id = MLX4_NET_TRANS_RULE_ID_TCP;
+		spec_l3->ipv4.src_ip = cmd->fs.h_u.tcp_ip4_spec.ip4src;
+		spec_l3->ipv4.dst_ip = cmd->fs.h_u.tcp_ip4_spec.ip4dst;
+		spec_l4->tcp_udp.src_port = cmd->fs.h_u.tcp_ip4_spec.psrc;
+		spec_l4->tcp_udp.dst_port = cmd->fs.h_u.tcp_ip4_spec.pdst;
+	} else {
+		spec_l4->id = MLX4_NET_TRANS_RULE_ID_UDP;
+		spec_l3->ipv4.src_ip = cmd->fs.h_u.udp_ip4_spec.ip4src;
+		spec_l3->ipv4.dst_ip = cmd->fs.h_u.udp_ip4_spec.ip4dst;
+		spec_l4->tcp_udp.src_port = cmd->fs.h_u.udp_ip4_spec.psrc;
+		spec_l4->tcp_udp.dst_port = cmd->fs.h_u.udp_ip4_spec.pdst;
+	}
+
+	if (spec_l3->ipv4.src_ip)
+		spec_l3->ipv4.src_ip_msk = EN_ETHTOOL_WORD_MASK;
+	if (spec_l3->ipv4.dst_ip)
+		spec_l3->ipv4.dst_ip_msk = EN_ETHTOOL_WORD_MASK;
+
+	if (spec_l4->tcp_udp.src_port)
+		spec_l4->tcp_udp.src_port_msk = EN_ETHTOOL_SHORT_MASK;
+	if (spec_l4->tcp_udp.dst_port)
+		spec_l4->tcp_udp.dst_port_msk = EN_ETHTOOL_SHORT_MASK;
+
+	list_add_tail(&spec_l3->list, list_h);
+	list_add_tail(&spec_l4->list, list_h);
+}
+
+static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
+					     struct ethtool_rxnfc *cmd,
+					     struct list_head *rule_list_h)
+{
+	int err;
+	u64 mac;
+	__be64 be_mac;
+	struct ethhdr *eth_spec;
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_spec_list *spec_l2;
+	__be64 mac_msk = cpu_to_be64(EN_ETHTOOL_MAC_MASK << 16);
+
+	err = mlx4_en_validate_flow(dev, cmd);
+	if (err)
+		return err;
+
+	spec_l2 = kzalloc(sizeof *spec_l2, GFP_KERNEL);
+	if (!spec_l2)
+		return -ENOMEM;
+
+	mac = priv->mac & EN_ETHTOOL_MAC_MASK;
+	be_mac = cpu_to_be64(mac << 16);
+
+	spec_l2->id = MLX4_NET_TRANS_RULE_ID_ETH;
+	memcpy(spec_l2->eth.dst_mac_msk, &mac_msk, ETH_ALEN);
+	if ((cmd->fs.flow_type & ~FLOW_EXT) != ETHER_FLOW)
+		memcpy(spec_l2->eth.dst_mac, &be_mac, ETH_ALEN);
+
+	if ((cmd->fs.flow_type & FLOW_EXT) && cmd->fs.m_ext.vlan_tci) {
+		spec_l2->eth.vlan_id = cmd->fs.h_ext.vlan_tci;
+		spec_l2->eth.vlan_id_msk = cpu_to_be16(0xfff);
+	}
+
+	list_add_tail(&spec_l2->list, rule_list_h);
+
+	switch (cmd->fs.flow_type & ~FLOW_EXT) {
+	case ETHER_FLOW:
+		eth_spec = &cmd->fs.h_u.ether_spec;
+		memcpy(&spec_l2->eth.dst_mac, eth_spec->h_dest, ETH_ALEN);
+		spec_l2->eth.ether_type = eth_spec->h_proto;
+		if (eth_spec->h_proto)
+			spec_l2->eth.ether_type_enable = 1;
+		break;
+	case IP_USER_FLOW:
+		add_ip_rule(priv, cmd, rule_list_h);
+		break;
+	case TCP_V4_FLOW:
+		add_tcp_udp_rule(priv, cmd, rule_list_h, TCP_V4_FLOW);
+		break;
+	case UDP_V4_FLOW:
+		add_tcp_udp_rule(priv, cmd, rule_list_h, UDP_V4_FLOW);
+		break;
+	}
+	return 0;
+}
+
+static int mlx4_en_flow_replace(struct net_device *dev,
+				struct ethtool_rxnfc *cmd)
+{
+	int err;
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct ethtool_flow_id *loc_rule;
+	struct mlx4_spec_list *spec, *tmp_spec;
+	u32 qpn;
+	u64 reg_id;
+
+	struct mlx4_net_trans_rule rule = {
+		.queue_mode = MLX4_NET_TRANS_Q_FIFO,
+		.exclusive = 0,
+		.allow_loopback = 1,
+		.promisc_mode = MLX4_FS_PROMISC_NONE,
+	};
+
+	rule.port = priv->port;
+	rule.priority = MLX4_DOMAIN_ETHTOOL | cmd->fs.location;
+	INIT_LIST_HEAD(&rule.list);
+
+	/* Allow direct QP attaches if the EN_ETHTOOL_QP_ATTACH flag is set */
+	if (cmd->fs.ring_cookie == RX_CLS_FLOW_DISC)
+		return -EOPNOTSUPP;
+	else if (cmd->fs.ring_cookie & EN_ETHTOOL_QP_ATTACH) {
+		qpn = cmd->fs.ring_cookie & (EN_ETHTOOL_QP_ATTACH - 1);
+	} else {
+		if (cmd->fs.ring_cookie >= priv->rx_ring_num) {
+			en_warn(priv, "rxnfc: RX ring (%llu) doesn't exist.\n",
+				cmd->fs.ring_cookie);
+			return -EINVAL;
+		}
+		qpn = priv->rss_map.qps[cmd->fs.ring_cookie].qpn;
+		if (!qpn) {
+			en_warn(priv, "rxnfc: RX ring (%llu) is inactive.\n",
+				cmd->fs.ring_cookie);
+			return -EINVAL;
+		}
+	}
+	rule.qpn = qpn;
+	err = mlx4_en_ethtool_to_net_trans_rule(dev, cmd, &rule.list);
+	if (err)
+		goto out_free_list;
+
+	loc_rule = &priv->ethtool_rules[cmd->fs.location];
+	if (loc_rule->id) {
+		err = mlx4_flow_detach(priv->mdev->dev, loc_rule->id);
+		if (err) {
+			en_err(priv, "Fail to detach network rule at location %d. registration id = %llx\n"
+			       , cmd->fs.location, loc_rule->id);
+			goto out_free_list;
+		}
+		loc_rule->id = 0;
+		memset(&loc_rule->flow_spec, 0,
+		       sizeof(struct ethtool_rx_flow_spec));
+	}
+	err = mlx4_flow_attach(priv->mdev->dev, &rule, &reg_id);
+	if (err) {
+		en_err(priv, "Fail to attach network rule at location %d.\n"
+		       , cmd->fs.location);
+		goto out_free_list;
+	}
+	loc_rule->id = reg_id;
+	memcpy(&loc_rule->flow_spec, &cmd->fs,
+	       sizeof(struct ethtool_rx_flow_spec));
+
+out_free_list:
+	list_for_each_entry_safe(spec, tmp_spec, &rule.list, list) {
+		list_del(&spec->list);
+		kfree(spec);
+	}
+	return err;
+}
+
+static int mlx4_en_flow_detach(struct net_device *dev,
+			       struct ethtool_rxnfc *cmd)
+{
+	int err = 0;
+	struct ethtool_flow_id *rule;
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+
+	if (cmd->fs.location >= MAX_NUM_OF_FS_RULES)
+		return -EINVAL;
+
+	rule = &priv->ethtool_rules[cmd->fs.location];
+	if (!rule->id) {
+		err =  -ENOENT;
+		goto out;
+	}
+
+	err = mlx4_flow_detach(priv->mdev->dev, rule->id);
+	if (err) {
+		en_err(priv, "Fail to detach network rule at location %d. registration id = 0x%llx\n"
+		       , cmd->fs.location, rule->id);
+		goto out;
+	}
+	rule->id = 0;
+	memset(&rule->flow_spec, 0, sizeof(struct ethtool_rx_flow_spec));
+out:
+	return err;
+
+}
+
+static int mlx4_en_get_flow(struct net_device *dev, struct ethtool_rxnfc *cmd,
+			    int loc)
+{
+	int err = 0;
+	struct ethtool_flow_id *rule;
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+
+	if (loc < 0 || loc >= MAX_NUM_OF_FS_RULES)
+		return -EINVAL;
+
+	rule = &priv->ethtool_rules[loc];
+	if (rule->id)
+		memcpy(&cmd->fs, &rule->flow_spec,
+		       sizeof(struct ethtool_rx_flow_spec));
+	else
+		err = -ENOENT;
+
+	return err;
+}
+
+static int mlx4_en_get_num_flows(struct mlx4_en_priv *priv)
+{
+
+	int i, res = 0;
+	for (i = 0; i < MAX_NUM_OF_FS_RULES; i++) {
+		if (priv->ethtool_rules[i].id)
+			res++;
+	}
+	return res;
+
+}
+
 static int mlx4_en_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
 			     u32 *rule_locs)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
 	int err = 0;
+	int i = 0, priority = 0;
+
+	if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_DEVICE_MANAGED)
+		return -EOPNOTSUPP;
 
 	switch (cmd->cmd) {
 	case ETHTOOL_GRXRINGS:
 		cmd->data = priv->rx_ring_num;
 		break;
+	case ETHTOOL_GRXCLSRLCNT:
+		cmd->rule_cnt = mlx4_en_get_num_flows(priv);
+		break;
+	case ETHTOOL_GRXCLSRULE:
+		err = mlx4_en_get_flow(dev, cmd, cmd->fs.location);
+		break;
+	case ETHTOOL_GRXCLSRLALL:
+		while (!err || err == -ENOENT) {
+			err = mlx4_en_get_flow(dev, cmd, i);
+			if (!err)
+				((u32 *)(rule_locs))[priority++] = i;
+			i++;
+		}
+		if (priority)
+			err = 0;
+		break;
 	default:
 		err = -EOPNOTSUPP;
 		break;
@@ -617,6 +965,30 @@ static int mlx4_en_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
 	return err;
 }
 
+static int mlx4_en_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{
+	int err = 0;
+	struct mlx4_en_priv *priv = netdev_priv(dev);
+	struct mlx4_en_dev *mdev = priv->mdev;
+
+	if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_DEVICE_MANAGED)
+		return -EOPNOTSUPP;
+
+	switch (cmd->cmd) {
+	case ETHTOOL_SRXCLSRLINS:
+		err = mlx4_en_flow_replace(dev, cmd);
+		break;
+	case ETHTOOL_SRXCLSRLDEL:
+		err = mlx4_en_flow_detach(dev, cmd);
+		break;
+	default:
+		en_warn(priv, "Unsupported ethtool command. (%d)\n", cmd->cmd);
+		return -EOPNOTSUPP;
+	}
+
+	return err;
+}
+
 const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_drvinfo = mlx4_en_get_drvinfo,
 	.get_settings = mlx4_en_get_settings,
@@ -637,6 +1009,7 @@ const struct ethtool_ops mlx4_en_ethtool_ops = {
 	.get_ringparam = mlx4_en_get_ringparam,
 	.set_ringparam = mlx4_en_set_ringparam,
 	.get_rxnfc = mlx4_en_get_rxnfc,
+	.set_rxnfc = mlx4_en_set_rxnfc,
 	.get_rxfh_indir_size = mlx4_en_get_rxfh_indir_size,
 	.get_rxfh_indir = mlx4_en_get_rxfh_indir,
 	.set_rxfh_indir = mlx4_en_set_rxfh_indir,
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 2d6dabe..8882e70 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -75,6 +75,7 @@
 #define STAMP_SHIFT		31
 #define STAMP_VAL		0x7fffffff
 #define STATS_DELAY		(HZ / 4)
+#define MAX_NUM_OF_FS_RULES	256
 
 /* Typical TSO descriptor with 16 gather entries is 352 bytes... */
 #define MAX_DESC_SIZE		512
@@ -435,6 +436,11 @@ struct mlx4_en_frag_info {
 
 #endif
 
+struct ethtool_flow_id {
+	struct ethtool_rx_flow_spec flow_spec;
+	u64 id;
+};
+
 struct mlx4_en_priv {
 	struct mlx4_en_dev *mdev;
 	struct mlx4_en_port_profile *prof;
@@ -444,6 +450,7 @@ struct mlx4_en_priv {
 	struct net_device_stats ret_stats;
 	struct mlx4_en_port_state port_state;
 	spinlock_t stats_lock;
+	struct ethtool_flow_id ethtool_rules[MAX_NUM_OF_FS_RULES];
 
 	unsigned long last_moder_packets[MAX_RX_RINGS];
 	unsigned long last_moder_tx_packets;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH net-next 10/10] net/mlx4_en: Add support for drop action through ethtool
  2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
                   ` (8 preceding siblings ...)
  2012-07-01  9:43 ` [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool Or Gerlitz
@ 2012-07-01  9:43 ` Or Gerlitz
  9 siblings, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01  9:43 UTC (permalink / raw)
  To: davem; +Cc: roland, yevgenyp, oren, netdev, Hadar Hen Zion, Or Gerlitz

From: Hadar Hen Zion <hadarh@mellanox.co.il>

The drop action is implemented by allocating a QP and keeping it in a reset state
such that the HW drops any packets which are steered to that QP. When a drop action
is requested, we attach the relevant flow to that QP.

Sign-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |    9 ++++++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c      |   30 +++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    3 ++
 4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 30de264..6c44047 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -814,7 +814,7 @@ static int mlx4_en_flow_replace(struct net_device *dev,
 
 	/* Allow direct QP attaches if the EN_ETHTOOL_QP_ATTACH flag is set */
 	if (cmd->fs.ring_cookie == RX_CLS_FLOW_DISC)
-		return -EOPNOTSUPP;
+		qpn = priv->drop_qp.qpn;
 	else if (cmd->fs.ring_cookie & EN_ETHTOOL_QP_ATTACH) {
 		qpn = cmd->fs.ring_cookie & (EN_ETHTOOL_QP_ATTACH - 1);
 	} else {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 46b58d8..ed121dc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -793,6 +793,10 @@ int mlx4_en_start_port(struct net_device *dev)
 		goto mac_err;
 	}
 
+	err = mlx4_en_create_drop_qp(priv);
+	if (err)
+		goto rss_err;
+
 	/* Configure tx cq's and rings */
 	for (i = 0; i < priv->tx_ring_num; i++) {
 		/* Configure cq */
@@ -892,7 +896,8 @@ tx_err:
 		mlx4_en_deactivate_tx_ring(priv, &priv->tx_ring[tx_index]);
 		mlx4_en_deactivate_cq(priv, &priv->tx_cq[tx_index]);
 	}
-
+	mlx4_en_destroy_drop_qp(priv);
+rss_err:
 	mlx4_en_release_rss_steer(priv);
 mac_err:
 	mlx4_put_eth_qp(mdev->dev, priv->port, priv->mac, priv->base_qpn);
@@ -947,6 +952,8 @@ void mlx4_en_stop_port(struct net_device *dev)
 	/* Flush multicast filter */
 	mlx4_SET_MCAST_FLTR(mdev->dev, priv->port, 0, 1, MLX4_MCAST_CONFIG);
 
+	mlx4_en_destroy_drop_qp(priv);
+
 	/* Free TX Rings */
 	for (i = 0; i < priv->tx_ring_num; i++) {
 		mlx4_en_deactivate_tx_ring(priv, &priv->tx_ring[i]);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index d49a7ac..a04cbf7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -844,6 +844,36 @@ out:
 	return err;
 }
 
+int mlx4_en_create_drop_qp(struct mlx4_en_priv *priv)
+{
+	int err;
+	u32 qpn;
+
+	err = mlx4_qp_reserve_range(priv->mdev->dev, 1, 1, &qpn);
+	if (err) {
+		en_err(priv, "Failed reserving drop qpn\n");
+		return err;
+	}
+	err = mlx4_qp_alloc(priv->mdev->dev, qpn, &priv->drop_qp);
+	if (err) {
+		en_err(priv, "Failed allocating drop qp\n");
+		mlx4_qp_release_range(priv->mdev->dev, qpn, 1);
+		return err;
+	}
+
+	return 0;
+}
+
+void mlx4_en_destroy_drop_qp(struct mlx4_en_priv *priv)
+{
+	u32 qpn;
+
+	qpn = priv->drop_qp.qpn;
+	mlx4_qp_remove(priv->mdev->dev, &priv->drop_qp);
+	mlx4_qp_free(priv->mdev->dev, &priv->drop_qp);
+	mlx4_qp_release_range(priv->mdev->dev, qpn, 1);
+}
+
 /* Allocate rx qp's and configure them according to rss map */
 int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 8882e70..a126321 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -500,6 +500,7 @@ struct mlx4_en_priv {
 	struct mlx4_en_rx_ring rx_ring[MAX_RX_RINGS];
 	struct mlx4_en_cq *tx_cq;
 	struct mlx4_en_cq rx_cq[MAX_RX_RINGS];
+	struct mlx4_qp drop_qp;
 	struct work_struct mcast_task;
 	struct work_struct mac_task;
 	struct work_struct watchdog_task;
@@ -586,6 +587,8 @@ void mlx4_en_unmap_buffer(struct mlx4_buf *buf);
 void mlx4_en_calc_rx_buf(struct net_device *dev);
 int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv);
 void mlx4_en_release_rss_steer(struct mlx4_en_priv *priv);
+int mlx4_en_create_drop_qp(struct mlx4_en_priv *priv);
+void mlx4_en_destroy_drop_qp(struct mlx4_en_priv *priv);
 int mlx4_en_free_tx_buf(struct net_device *dev, struct mlx4_en_tx_ring *ring);
 void mlx4_en_rx_irq(struct mlx4_cq *mcq);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 01/10] net/mlx4_core: Change resource tracking mechanism to use red-black tree
  2012-07-01  9:43 ` [PATCH net-next 01/10] net/mlx4_core: Change resource tracking mechanism to use red-black tree Or Gerlitz
@ 2012-07-01 10:17   ` David Miller
  0 siblings, 0 replies; 45+ messages in thread
From: David Miller @ 2012-07-01 10:17 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun,  1 Jul 2012 12:43:34 +0300

> @@ -733,7 +776,7 @@ static int qp_res_start_move_to(struct mlx4_dev *dev, int slave, int qpn,
>  	int err = 0;
>  
>  	spin_lock_irq(mlx4_tlock(dev));
> -	r = radix_tree_lookup(&tracker->res_tree[RES_QP], qpn);
> +	r = (struct res_qp *)res_tracker_lookup(&tracker->res_tree[RES_QP], qpn);
>  	if (!r)
>  		err = -ENOENT;
>  	else if (r->com.owner != slave)

Casts are terrible, return "void *" from res_tracker_lookup() just as
radix_tree_lookup() does.

Also you have indentation problems all over this patch.  When you have
a multi-line function call the first non-whitespace character on the
second and subsequent lines should line up with the first column after
the openning parenthesis on the first line.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 04/10] net/mlx4: Set steering mode according to device capabilities
  2012-07-01  9:43 ` [PATCH net-next 04/10] net/mlx4: Set steering mode according to device capabilities Or Gerlitz
@ 2012-07-01 10:20   ` David Miller
  0 siblings, 0 replies; 45+ messages in thread
From: David Miller @ 2012-07-01 10:20 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun,  1 Jul 2012 12:43:37 +0300

> +				/* Add the default qp number as multicast
> +				* promisc */

This is not the correct way to format a comment, do it:

	/* Like
	 * this.
	 */

> +	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER &&
> +	    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER) {
> +		dev->caps.steering_mode = MLX4_STEERING_MODE_B0;
> +
> +	} else {

That empty line is extraneous and ugly, remove it.

> +/* Driver supports 2 diffrent device methods to manage traffic steering:
> +	- B0 steering mode - Common low level API for ib and (if supported) eth.
> +	- A0 steering mode - Limited low level API for eth. In case of IB,
> +			     B0 mode is in use.
> + */

Improperly formatted, do it like this:

/* Driver supports 2 diffrent device methods to manage traffic steering:
 *	- B0 steering mode - Common low level API for ib and (if supported) eth.
 *	- A0 steering mode - Limited low level API for eth. In case of IB,
 *			     B0 mode is in use.
 */

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 07/10] net/mlx4_core: Add resource tracking for device managed flow steering rules
  2012-07-01  9:43 ` [PATCH net-next 07/10] net/mlx4_core: Add resource tracking for device managed flow steering rules Or Gerlitz
@ 2012-07-01 10:22   ` David Miller
  0 siblings, 0 replies; 45+ messages in thread
From: David Miller @ 2012-07-01 10:22 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun,  1 Jul 2012 12:43:40 +0300

> +					err = mlx4_cmd(dev, base, 0, 0,
> +						MLX4_QP_FLOW_STEERING_DETACH,
> +						MLX4_CMD_TIME_CLASS_A,
> +						MLX4_CMD_NATIVE);

This is improperly formatted, line up the arguments with the first column
after the openning parenthesis on the first line.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01  9:43 ` [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool Or Gerlitz
@ 2012-07-01 10:23   ` David Miller
  2012-07-01 16:00   ` Ben Hutchings
  1 sibling, 0 replies; 45+ messages in thread
From: David Miller @ 2012-07-01 10:23 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun,  1 Jul 2012 12:43:42 +0300

> +		en_err(priv, "Fail to detach network rule at location %d. registration id = 0x%llx\n"
> +		       , cmd->fs.location, rule->id);

Put that first comma at the end of the first line, this is very ugly.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-01  9:43 ` [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API Or Gerlitz
@ 2012-07-01 10:30   ` David Miller
  2012-07-01 12:29     ` Or Gerlitz
  0 siblings, 1 reply; 45+ messages in thread
From: David Miller @ 2012-07-01 10:30 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun,  1 Jul 2012 12:43:39 +0300

> +		/* Enable Ethernet flow steering
> +		 * with udp unicast and tcp unicast */

Improperly formatted comment, do it like this:

		/* Enable Ethernet flow steering
		 * with udp unicast and tcp unicast
		 */

> +		/* Enable IPoIB flow steering
> +		 * with udp unicast and tcp unicast */

Likewise.

> @@ -136,6 +138,11 @@ module_param_array(port_type_array, int, &arr_argc, 0444);
>  MODULE_PARM_DESC(port_type_array, "Array of port types: HW_DEFAULT (0) is default "
>  				"1 for IB, 2 for Ethernet");
>  
> +static int mlx4_flow_steering_hash;
> +module_param_named(flow_steering_hash, mlx4_flow_steering_hash, int, 0444);
> +MODULE_PARM_DESC(flow_steering_hash,
> +		 "Flow steering hash function configuration. Config options: L2 = 0, L2_L3_L4 = 1 (default: L2).");
> +

No module paramters, do it via ethtool or similar.

> +		if (dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_UC_STEER &&
> +		    dev->caps.flags & MLX4_DEV_CAP_FLAG_VEP_MC_STEER) {
> +			dev->caps.steering_mode = MLX4_STEERING_MODE_B0;
> +
> +		} else {

Get rid of that pointless empty line

> +
> +			dev->caps.steering_mode = MLX4_STEERING_MODE_A0;

Likewise.
> +			/* Enable flow steering with
> +			   udp unicast and tcp unicast*/

Comment formatting.

> +static void trans_rule_ctrl_to_hw(struct mlx4_net_trans_rule *ctrl,
> +				struct mlx4_net_trans_rule_hw_ctrl *hw)

Second line improperly indented.

> +		memcpy(rule_hw->eth.dst_mac_msk, spec->eth.dst_mac_msk,
> +								ETH_ALEN);

Don't make us barf, other people have to read this stuff.  This looks
terrible, indent it properly.

> +		memcpy(rule_hw->eth.src_mac_msk, spec->eth.src_mac_msk,
> +								ETH_ALEN);

Likewise.

> +		mlx4_err(dev, "Fail to detach network rule. registration id = 0x%llx\n"
> +								, reg_id);

Please format this properly.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 03/10] net/mlx4_en: Re-design multicast attachments flow
  2012-07-01  9:43 ` [PATCH net-next 03/10] net/mlx4_en: Re-design multicast attachments flow Or Gerlitz
@ 2012-07-01 10:32   ` David Miller
  0 siblings, 0 replies; 45+ messages in thread
From: David Miller @ 2012-07-01 10:32 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, yevgenyp, sbohrer, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun,  1 Jul 2012 12:43:36 +0300

> +	/* Find all the entries that should be removed from dst,
> +	 * These are the entries that are not found in src */

Improperly formatted comment.

> +			new_mc = kmalloc(sizeof(struct mlx4_en_mc_list),
> +					     GFP_KERNEL);

Improperly indented second line.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-01 10:30   ` David Miller
@ 2012-07-01 12:29     ` Or Gerlitz
  2012-07-01 21:42       ` David Miller
  0 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-01 12:29 UTC (permalink / raw)
  To: David Miller; +Cc: roland, yevgenyp, oren, netdev, hadarh

On 7/1/2012 1:30 PM, David Miller wrote:
[...]
> @@ -136,6 +138,11 @@ module_param_array(port_type_array, int, &arr_argc, 0444);
>   MODULE_PARM_DESC(port_type_array, "Array of port types: HW_DEFAULT (0) is default "
>   				"1 for IB, 2 for Ethernet");
>   
> +static int mlx4_flow_steering_hash;
> +module_param_named(flow_steering_hash, mlx4_flow_steering_hash, int, 0444);
> +MODULE_PARM_DESC(flow_steering_hash,
> +		 "Flow steering hash function configuration. Config options: L2 = 0, L2_L3_L4 = 1 (default: L2).");
> +
>
> No module paramters, do it via ethtool or similar.
>

Dave,

We'll be handling the indentation, comments and other syntax issues you 
have pointed out
and submit V1, hopefully which would be clean from such errors.

Re the usage of the module param - as was pointed out in the change-log, 
please note that
this policy is **global** to the HCA, that is effects all the Ethernet 
(and down the road,
also IPoIB flow-steering, when we run a card with one IB port and one 
Eth port) net-devices
that relate to that device.

The module param is for mlx4_core and not for mlx4_en. In that respect, 
per net-device
ethtool directive wouldn't work, even when there is a dedicated firmware 
command that
allows changing this in run-time, which we don't have now.

My thinking was that once we have this command at hand, we can add 
sysfs/etc entry at
the mlx4_core level to provide a different hash function to the device. 
And for the
time being use the module param, makes sense?

Or.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01  9:43 ` [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool Or Gerlitz
  2012-07-01 10:23   ` David Miller
@ 2012-07-01 16:00   ` Ben Hutchings
  2012-07-01 16:38     ` Joe Perches
                       ` (3 more replies)
  1 sibling, 4 replies; 45+ messages in thread
From: Ben Hutchings @ 2012-07-01 16:00 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: davem, roland, yevgenyp, oren, netdev, Hadar Hen Zion

On Sun, 2012-07-01 at 12:43 +0300, Or Gerlitz wrote:
> From: Hadar Hen Zion <hadarh@mellanox.co.il>
> 
> Implement the ethtool APIs for attaching L2/L3/L4 based flow steering
> rules to the netdevice RX rings. Added set_rxnfc callback and enhanced
> the existing get_rxnfc callback.
> 
> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  373 +++++++++++++++++++++++
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    7 +
>  2 files changed, 380 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> index 72901ce..30de264 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> @@ -38,6 +38,10 @@
>  #include "mlx4_en.h"
>  #include "en_port.h"
>  
> +#define EN_ETHTOOL_QP_ATTACH (1ull << 63)
> +#define EN_ETHTOOL_MAC_MASK 0xffffffffffffULL
> +#define EN_ETHTOOL_SHORT_MASK cpu_to_be16(0xffff)
> +#define EN_ETHTOOL_WORD_MASK  cpu_to_be32(0xffffffff)
>  
>  static void
>  mlx4_en_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *drvinfo)
> @@ -599,16 +603,360 @@ static int mlx4_en_set_rxfh_indir(struct net_device *dev,
>  	return err;
>  }
>  
> +#define not_all_zeros_or_all_ones(field, type) \
> +	(field && (type)~field)
> +
> +static int mlx4_en_validate_flow(struct net_device *dev,
> +				 struct ethtool_rxnfc *cmd)
> +{
> +	struct ethtool_usrip4_spec *l3_mask;
> +	struct ethtool_tcpip4_spec *l4_mask;
> +	struct ethhdr *eth_mask;
> +	u64 full_mac = ~0ull;
> +	u64 zero_mac = 0;
> +
> +	if (cmd->fs.location >= MAX_NUM_OF_FS_RULES)
> +		return -EINVAL;
> +
> +	switch (cmd->fs.flow_type & ~FLOW_EXT) {
> +	case TCP_V4_FLOW:
> +	case UDP_V4_FLOW:
> +		if (cmd->fs.h_u.tcp_ip4_spec.tos)
> +			return -EOPNOTSUPP;

I suspect that your filter ignores TOS, rather than only matching TOS ==
0, so you should actually be checking the corresponding field in the
mask (fs.m_u).  The error code should be EINVAL.

> +		l4_mask = &cmd->fs.m_u.tcp_ip4_spec;
> +		/* don't allow mask which isn't all 0 or 1 */
> +		if (not_all_zeros_or_all_ones(l4_mask->ip4src, __be32) ||
> +		    not_all_zeros_or_all_ones(l4_mask->ip4dst, __be32) ||
> +		    not_all_zeros_or_all_ones(l4_mask->psrc, __be16) ||
> +		    not_all_zeros_or_all_ones(l4_mask->pdst, __be16))
> +			return -EOPNOTSUPP;

Again, here and in many further instances, the error code should be
EINVAL.

> +		break;
> +	case IP_USER_FLOW:
> +		l3_mask = &cmd->fs.m_u.usr_ip4_spec;
> +		if (cmd->fs.h_u.usr_ip4_spec.l4_4_bytes ||
> +		    cmd->fs.h_u.usr_ip4_spec.tos ||

I think this should be checking l4_4_bytes and tos in the mask.

> +		    cmd->fs.h_u.usr_ip4_spec.proto ||
> +		    cmd->fs.h_u.usr_ip4_spec.ip_ver != ETH_RX_NFC_IP4 ||
> +		    (!cmd->fs.h_u.usr_ip4_spec.ip4src &&
> +		     !cmd->fs.h_u.usr_ip4_spec.ip4dst) ||
> +		    not_all_zeros_or_all_ones(l3_mask->ip4src, __be32) ||
> +		    not_all_zeros_or_all_ones(l3_mask->ip4dst, __be32))
> +			return -EOPNOTSUPP;
> +		break;
> +	case ETHER_FLOW:
> +		eth_mask = &cmd->fs.m_u.ether_spec;
> +		if (memcmp(eth_mask->h_source, &zero_mac, ETH_ALEN))
> +			return -EOPNOTSUPP;
> +		if (!memcmp(eth_mask->h_dest, &zero_mac, ETH_ALEN))
> +			return -EOPNOTSUPP;

But in the next statement you test whether eth_mask->h_dest is either
all-zeroes or all-ones.  Is all-zeroes valid or not?  I suspect you
actually intend to reject the case where both h_dest and h_proto are
masked out.

> +		if (not_all_zeros_or_all_ones(eth_mask->h_proto, __be16) ||
> +		    (memcmp(eth_mask->h_dest, &zero_mac, ETH_ALEN) &&
> +		     memcmp(eth_mask->h_dest, &full_mac, ETH_ALEN)))
> +			return -EOPNOTSUPP;
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if ((cmd->fs.flow_type & FLOW_EXT)) {
> +		if (cmd->fs.m_ext.vlan_etype ||
> +		    not_all_zeros_or_all_ones(cmd->fs.m_ext.vlan_tci,
> +					       __be16)) {
> +			return -EOPNOTSUPP;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static void add_ip_rule(struct mlx4_en_priv *priv,
> +			struct ethtool_rxnfc *cmd,
> +			struct list_head *list_h)
> +{
> +	struct mlx4_spec_list *spec_l3;
> +
> +	spec_l3 = kzalloc(sizeof *spec_l3, GFP_KERNEL);
> +	if (!spec_l3) {
> +		en_err(priv, "Fail to alloc ethtool rule.\n");
> +		return;
> +	}

This should return an error code as well - logging is not a substitue.

> +	spec_l3->id = MLX4_NET_TRANS_RULE_ID_IPV4;
> +	spec_l3->ipv4.src_ip = cmd->fs.h_u.usr_ip4_spec.ip4src;
> +	if (spec_l3->ipv4.src_ip)
> +		spec_l3->ipv4.src_ip_msk = EN_ETHTOOL_WORD_MASK;
> +	spec_l3->ipv4.dst_ip = cmd->fs.h_u.usr_ip4_spec.ip4dst;
> +	if (spec_l3->ipv4.dst_ip)
> +		spec_l3->ipv4.dst_ip_msk = EN_ETHTOOL_WORD_MASK;

The conditions should be using the mask (cmd->fs.m_u) not the value.

> +	list_add_tail(&spec_l3->list, list_h);
> +}
> +
> +static void add_tcp_udp_rule(struct mlx4_en_priv *priv,
> +			     struct ethtool_rxnfc *cmd,
> +			     struct list_head *list_h, int proto)
> +{
> +	struct mlx4_spec_list *spec_l3;
> +	struct mlx4_spec_list *spec_l4;
> +
> +	spec_l3 = kzalloc(sizeof *spec_l3, GFP_KERNEL);
> +	spec_l4 = kzalloc(sizeof *spec_l4, GFP_KERNEL);
> +	if (!spec_l4 || !spec_l3) {
> +		en_err(priv, "Fail to alloc ethtool rule.\n");

If one of them was successfully allocated, it will now be leaked.

> +		return;
> +	}
> +
> +	spec_l3->id = MLX4_NET_TRANS_RULE_ID_IPV4;
> +
> +	if (proto == TCP_V4_FLOW) {
> +		spec_l4->id = MLX4_NET_TRANS_RULE_ID_TCP;
> +		spec_l3->ipv4.src_ip = cmd->fs.h_u.tcp_ip4_spec.ip4src;
> +		spec_l3->ipv4.dst_ip = cmd->fs.h_u.tcp_ip4_spec.ip4dst;
> +		spec_l4->tcp_udp.src_port = cmd->fs.h_u.tcp_ip4_spec.psrc;
> +		spec_l4->tcp_udp.dst_port = cmd->fs.h_u.tcp_ip4_spec.pdst;
> +	} else {
> +		spec_l4->id = MLX4_NET_TRANS_RULE_ID_UDP;
> +		spec_l3->ipv4.src_ip = cmd->fs.h_u.udp_ip4_spec.ip4src;
> +		spec_l3->ipv4.dst_ip = cmd->fs.h_u.udp_ip4_spec.ip4dst;
> +		spec_l4->tcp_udp.src_port = cmd->fs.h_u.udp_ip4_spec.psrc;
> +		spec_l4->tcp_udp.dst_port = cmd->fs.h_u.udp_ip4_spec.pdst;
> +	}
> +
> +	if (spec_l3->ipv4.src_ip)
> +		spec_l3->ipv4.src_ip_msk = EN_ETHTOOL_WORD_MASK;
> +	if (spec_l3->ipv4.dst_ip)
> +		spec_l3->ipv4.dst_ip_msk = EN_ETHTOOL_WORD_MASK;
> +
> +	if (spec_l4->tcp_udp.src_port)
> +		spec_l4->tcp_udp.src_port_msk = EN_ETHTOOL_SHORT_MASK;
> +	if (spec_l4->tcp_udp.dst_port)
> +		spec_l4->tcp_udp.dst_port_msk = EN_ETHTOOL_SHORT_MASK;

All these conditions should be using the mask, not the value.

> +	list_add_tail(&spec_l3->list, list_h);
> +	list_add_tail(&spec_l4->list, list_h);
> +}
> +
> +static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
> +					     struct ethtool_rxnfc *cmd,
> +					     struct list_head *rule_list_h)
> +{
> +	int err;
> +	u64 mac;
> +	__be64 be_mac;
> +	struct ethhdr *eth_spec;
> +	struct mlx4_en_priv *priv = netdev_priv(dev);
> +	struct mlx4_spec_list *spec_l2;
> +	__be64 mac_msk = cpu_to_be64(EN_ETHTOOL_MAC_MASK << 16);
> +
> +	err = mlx4_en_validate_flow(dev, cmd);
> +	if (err)
> +		return err;
> +
> +	spec_l2 = kzalloc(sizeof *spec_l2, GFP_KERNEL);
> +	if (!spec_l2)
> +		return -ENOMEM;
> +
> +	mac = priv->mac & EN_ETHTOOL_MAC_MASK;
> +	be_mac = cpu_to_be64(mac << 16);
> +
> +	spec_l2->id = MLX4_NET_TRANS_RULE_ID_ETH;
> +	memcpy(spec_l2->eth.dst_mac_msk, &mac_msk, ETH_ALEN);
> +	if ((cmd->fs.flow_type & ~FLOW_EXT) != ETHER_FLOW)
> +		memcpy(spec_l2->eth.dst_mac, &be_mac, ETH_ALEN);

Does the hardware require filtering by MAC address and not only by layer
3/4 addresses?

> +	if ((cmd->fs.flow_type & FLOW_EXT) && cmd->fs.m_ext.vlan_tci) {
> +		spec_l2->eth.vlan_id = cmd->fs.h_ext.vlan_tci;
> +		spec_l2->eth.vlan_id_msk = cpu_to_be16(0xfff);

This doesn't match mlx4_en_validate_flow(); you are replacing a mask of
0xffff with 0xfff.

> +	}
> +
> +	list_add_tail(&spec_l2->list, rule_list_h);
> +
> +	switch (cmd->fs.flow_type & ~FLOW_EXT) {
> +	case ETHER_FLOW:
> +		eth_spec = &cmd->fs.h_u.ether_spec;
> +		memcpy(&spec_l2->eth.dst_mac, eth_spec->h_dest, ETH_ALEN);
> +		spec_l2->eth.ether_type = eth_spec->h_proto;
> +		if (eth_spec->h_proto)
> +			spec_l2->eth.ether_type_enable = 1;
> +		break;
> +	case IP_USER_FLOW:
> +		add_ip_rule(priv, cmd, rule_list_h);
> +		break;
> +	case TCP_V4_FLOW:
> +		add_tcp_udp_rule(priv, cmd, rule_list_h, TCP_V4_FLOW);
> +		break;
> +	case UDP_V4_FLOW:
> +		add_tcp_udp_rule(priv, cmd, rule_list_h, UDP_V4_FLOW);
> +		break;

All those functions need to be able to return errors.

> +	}
> +	return 0;
> +}
> +
> +static int mlx4_en_flow_replace(struct net_device *dev,
> +				struct ethtool_rxnfc *cmd)
> +{
> +	int err;
> +	struct mlx4_en_priv *priv = netdev_priv(dev);
> +	struct ethtool_flow_id *loc_rule;
> +	struct mlx4_spec_list *spec, *tmp_spec;
> +	u32 qpn;
> +	u64 reg_id;
> +
> +	struct mlx4_net_trans_rule rule = {
> +		.queue_mode = MLX4_NET_TRANS_Q_FIFO,
> +		.exclusive = 0,
> +		.allow_loopback = 1,
> +		.promisc_mode = MLX4_FS_PROMISC_NONE,
> +	};
> +
> +	rule.port = priv->port;
> +	rule.priority = MLX4_DOMAIN_ETHTOOL | cmd->fs.location;
> +	INIT_LIST_HEAD(&rule.list);
> +
> +	/* Allow direct QP attaches if the EN_ETHTOOL_QP_ATTACH flag is set */
> +	if (cmd->fs.ring_cookie == RX_CLS_FLOW_DISC)
> +		return -EOPNOTSUPP;

EINVAL.

[...]
>  static int mlx4_en_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
>  			     u32 *rule_locs)
>  {
>  	struct mlx4_en_priv *priv = netdev_priv(dev);
> +	struct mlx4_en_dev *mdev = priv->mdev;
>  	int err = 0;
> +	int i = 0, priority = 0;
> +
> +	if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_DEVICE_MANAGED)
> +		return -EOPNOTSUPP;
>  
>  	switch (cmd->cmd) {
>  	case ETHTOOL_GRXRINGS:
>  		cmd->data = priv->rx_ring_num;
>  		break;
> +	case ETHTOOL_GRXCLSRLCNT:
> +		cmd->rule_cnt = mlx4_en_get_num_flows(priv);
> +		break;
> +	case ETHTOOL_GRXCLSRULE:
> +		err = mlx4_en_get_flow(dev, cmd, cmd->fs.location);
> +		break;
> +	case ETHTOOL_GRXCLSRLALL:
> +		while (!err || err == -ENOENT) {
> +			err = mlx4_en_get_flow(dev, cmd, i);
> +			if (!err)
> +				((u32 *)(rule_locs))[priority++] = i;

I don't see any range check against cmd->rule_cnt.

Why are you casting rule_locs?

Also, are the rules really prioritised by location, so that if rule 0
and 1 match a packet then only rule 0 is applied?  If they are actually
prioritised by the match type then you need to assign locations on the
driver side that reflect the actual priorities.

> +			i++;
> +		}
> +		if (priority)
> +			err = 0;
[...]

But if there are no rules defined, this is an error?  That's not right.
I think you should unconditionally set err = 0 here.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01 16:00   ` Ben Hutchings
@ 2012-07-01 16:38     ` Joe Perches
  2012-07-01 17:31       ` Joe Perches
  2012-07-02 12:57     ` Or Gerlitz
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Joe Perches @ 2012-07-01 16:38 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Or Gerlitz, davem, roland, yevgenyp, oren, netdev, Hadar Hen Zion

On Sun, 2012-07-01 at 17:00 +0100, Ben Hutchings wrote:
> On Sun, 2012-07-01 at 12:43 +0300, Or Gerlitz wrote:
> > From: Hadar Hen Zion <hadarh@mellanox.co.il>
[]
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
[]
> > @@ -599,16 +603,360 @@ static int mlx4_en_set_rxfh_indir(struct net_device *dev,
> >  	return err;
> >  }
> >  
> > +#define not_all_zeros_or_all_ones(field, type) \
> > +	(field && (type)~field)

I think this macro is suboptimal because
negated names are easy to misuse.

I think type is also unnecessary and too
easy to mismatch or keep up to date with
field type changes.

Perhaps it's better as:

#define all_zeros_or_all_ones(field)		\
({						\
	field && (typeof(field))~field;		\
})

> > +
> > +static int mlx4_en_validate_flow(struct net_device *dev,
> > +				 struct ethtool_rxnfc *cmd)
> > +{
[]
> > +		/* don't allow mask which isn't all 0 or 1 */
> > +		if (not_all_zeros_or_all_ones(l4_mask->ip4src, __be32) ||
> > +		    not_all_zeros_or_all_ones(l4_mask->ip4dst, __be32) ||
> > +		    not_all_zeros_or_all_ones(l4_mask->psrc, __be16) ||
> > +		    not_all_zeros_or_all_ones(l4_mask->pdst, __be16))
> > +			return -EOPNOTSUPP;

		if (!all_zeros_or_all_ones(l4_mask->ip4src) ||
		    !all_zeros_or_all_ones(l4_mask->ip4dst) ||
		    !all_zeros_or_all_ones(l4_mask->psrc) ||
		    !all_zeros_or_all_ones(l4_mask->pdst))

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01 16:38     ` Joe Perches
@ 2012-07-01 17:31       ` Joe Perches
  2012-07-01 18:48         ` Andreas Schwab
  0 siblings, 1 reply; 45+ messages in thread
From: Joe Perches @ 2012-07-01 17:31 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Or Gerlitz, davem, roland, yevgenyp, oren, netdev, Hadar Hen Zion

On Sun, 2012-07-01 at 09:38 -0700, Joe Perches wrote:

> I think this macro is suboptimal because
> negated names are easy to misuse.
> 
> I think type is also unnecessary and too
> easy to mismatch or keep up to date with
> field type changes.
> 
> Perhaps it's better as:
> 
> #define all_zeros_or_all_ones(field)		\
> ({						\
> 	field && (typeof(field))~field;		\
> })

Umm, or not.

It helps when I actually test the code not just type
it into an email client.

	!(field && (typeof(field))~field)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01 17:31       ` Joe Perches
@ 2012-07-01 18:48         ` Andreas Schwab
  2012-07-01 19:52           ` Joe Perches
  0 siblings, 1 reply; 45+ messages in thread
From: Andreas Schwab @ 2012-07-01 18:48 UTC (permalink / raw)
  To: Joe Perches
  Cc: Ben Hutchings, Or Gerlitz, davem, roland, yevgenyp, oren, netdev,
	Hadar Hen Zion

Joe Perches <joe@perches.com> writes:

> On Sun, 2012-07-01 at 09:38 -0700, Joe Perches wrote:
>
>> I think this macro is suboptimal because
>> negated names are easy to misuse.
>> 
>> I think type is also unnecessary and too
>> easy to mismatch or keep up to date with
>> field type changes.
>> 
>> Perhaps it's better as:
>> 
>> #define all_zeros_or_all_ones(field)		\
>> ({						\
>> 	field && (typeof(field))~field;		\
>> })
>
> Umm, or not.
>
> It helps when I actually test the code not just type
> it into an email client.
>
> 	!(field && (typeof(field))~field)

Or write it as (!field || !(typeof(field))~field) which more closely
resembles what the macro name expresses.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01 18:48         ` Andreas Schwab
@ 2012-07-01 19:52           ` Joe Perches
  2012-07-02 10:19             ` David Laight
  0 siblings, 1 reply; 45+ messages in thread
From: Joe Perches @ 2012-07-01 19:52 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Ben Hutchings, Or Gerlitz, davem, roland, yevgenyp, oren, netdev,
	Hadar Hen Zion

On Sun, 2012-07-01 at 20:48 +0200, Andreas Schwab wrote:
> Joe Perches <joe@perches.com> writes:
> > On Sun, 2012-07-01 at 09:38 -0700, Joe Perches wrote:
> >> Perhaps it's better as:
> >> 
> >> #define all_zeros_or_all_ones(field)		\
> >> ({						\
> >> 	field && (typeof(field))~field;		\
> >> })
> >
> > Umm, or not.
> >
> > It helps when I actually test the code not just type
> > it into an email client.
> >
> > 	!(field && (typeof(field))~field)
> 
> Or write it as (!field || !(typeof(field))~field) which more closely
> resembles what the macro name expresses.

Better still, or maybe:

	field == 0 || field == (typeof field)~0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-01 12:29     ` Or Gerlitz
@ 2012-07-01 21:42       ` David Miller
  2012-07-02  7:55         ` Or Gerlitz
  0 siblings, 1 reply; 45+ messages in thread
From: David Miller @ 2012-07-01 21:42 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Sun, 1 Jul 2012 15:29:41 +0300

> And for the time being use the module param, makes sense?

This talk about making an alternative, clean, mechanism later
is non-sense.  Because you'll never be able to get rid of this
module param once you add it.

Module parameters stink because every driver is going to provide the
knob differently, with a different name, and different semantics.

This creates a terrible user experience, and I will not allow it.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-01 21:42       ` David Miller
@ 2012-07-02  7:55         ` Or Gerlitz
  2012-07-02  8:14           ` Roland Dreier
  2012-07-02  8:34           ` David Miller
  0 siblings, 2 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-02  7:55 UTC (permalink / raw)
  To: David Miller, roland; +Cc: yevgenyp, oren, netdev, hadarh

On 7/2/2012 12:42 AM, David Miller wrote:
> [...] Module parameters stink because every driver is going to provide 
> the knob differently, with a different name, and different semantics. 
> This creates a terrible user experience, and I will not allow it. 

OK, so if looking on what we are left with on the table, seems that 
sysfs entry on the mlx4_core
level (as we do for the port link type {IB, Eth} or IB port MTU) could 
be fine here, Roland, agree?

Or.

We're talking on the /sys/bus/pci/devices entry for the card, e.g for a 
card sitting
in 06:00.0 this new entry will be under /sys/bus/pci/devices/0000:06:00.0/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-02  7:55         ` Or Gerlitz
@ 2012-07-02  8:14           ` Roland Dreier
  2012-07-02  8:28             ` Or Gerlitz
  2012-07-02  8:34           ` David Miller
  1 sibling, 1 reply; 45+ messages in thread
From: Roland Dreier @ 2012-07-02  8:14 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, yevgenyp, oren, netdev, hadarh

On Mon, Jul 2, 2012 at 12:55 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 7/2/2012 12:42 AM, David Miller wrote:
>> [...] Module parameters stink because every driver is going to provide the
>> knob differently, with a different name, and different semantics. This
>> creates a terrible user experience, and I will not allow it.
>
> OK, so if looking on what we are left with on the table, seems that sysfs
> entry on the mlx4_core
> level (as we do for the port link type {IB, Eth} or IB port MTU) could be
> fine here, Roland, agree?

What was wrong with Dave's initial suggestion of ethtool?  All the
other steering stuff is configured that way, why not this hash?

 - R.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-02  8:14           ` Roland Dreier
@ 2012-07-02  8:28             ` Or Gerlitz
  2012-07-02  8:36               ` David Miller
  0 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-02  8:28 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Miller, yevgenyp, oren, netdev, hadarh

On 7/2/2012 11:14 AM, Roland Dreier wrote:
> What was wrong with Dave's initial suggestion of ethtool? All the 
> other steering stuff is configured that way, why not this hash?

Roland, as I wrote earlier on this thread -->  [...] pointed out in the 
change-log, note that this policy is **global** to the HCA, that is 
effects all the Ethernet (and down the road, also when adding support 
for IPoIB flow-steering, under a config of card with one IB port and one 
Eth port) net-devices that relate to that mlx4 device instance.

In a system with (say) one card and two Ethernet ports, where for each 
port there's corresponding ethN interface, **both** mlx4_en net-devices 
use the same instance of struct mlx4_device, which means that if we let 
the user to set through ethtool the flow steering hash of ethN1 this 
will evetually change also the hash used for packets going to ethN2, or 
in other words, in mlx4 language this param belong to the mlx4_core 
module. In that respect, I was  thinking on using sysfs as we do for the 
port link type and IB mtu, hope this makes things clearer, SB  the 
relevant code, now with the global module param which can change to 
using per mlx4_core device sysfs.

Or.



> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
> @@ -1231,6 +1236,21 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
>   			goto err_stop_fw;
>   		}
>   
> +		priv->fs_hash_mode = mlx4_flow_steering_hash;
> +
> +		switch (priv->fs_hash_mode) {
> +		case MLX4_FS_L2_HASH:
> +			init_hca.fs_hash_enable_bits = 0;
> +			break;
> +
> +		case MLX4_FS_L2_L3_L4_HASH:
> +			/* Enable flow steering with
> +			   udp unicast and tcp unicast*/
> +			init_hca.fs_hash_enable_bits =
> +				MLX4_FS_UDP_UC_EN | MLX4_FS_TCP_UC_EN;
> +			break;
> +		}
> +
>   		profile = default_profile;
>   		if (dev->caps.steering_mode ==
>   		    MLX4_STEERING_MODE_DEVICE_MANAGED)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-02  7:55         ` Or Gerlitz
  2012-07-02  8:14           ` Roland Dreier
@ 2012-07-02  8:34           ` David Miller
  2012-07-02 18:07             ` Ben Hutchings
  1 sibling, 1 reply; 45+ messages in thread
From: David Miller @ 2012-07-02  8:34 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Mon, 2 Jul 2012 10:55:28 +0300

> On 7/2/2012 12:42 AM, David Miller wrote:
>> [...] Module parameters stink because every driver is going to provide
>> the knob differently, with a different name, and different
>> semantics. This creates a terrible user experience, and I will not
>> allow it.
> 
> OK, so if looking on what we are left with on the table, seems that
> sysfs entry on the mlx4_core
> level (as we do for the port link type {IB, Eth} or IB port MTU) could
> be fine here, Roland, agree?

No way.

You have to create a real interface, that other vendors with similar
chips can consistently use.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-02  8:28             ` Or Gerlitz
@ 2012-07-02  8:36               ` David Miller
  2012-07-02 13:00                 ` Or Gerlitz
  0 siblings, 1 reply; 45+ messages in thread
From: David Miller @ 2012-07-02  8:36 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, yevgenyp, oren, netdev, hadarh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Mon, 2 Jul 2012 11:28:18 +0300

> On 7/2/2012 11:14 AM, Roland Dreier wrote:
>> What was wrong with Dave's initial suggestion of ethtool? All the
>> other steering stuff is configured that way, why not this hash?
> 
> Roland, as I wrote earlier on this thread --> [...] pointed out in the
> change-log, note that this policy is **global** to the HCA, that is
> effects all the Ethernet (and down the road, also when adding support
> for IPoIB flow-steering, under a config of card with one IB port and
> one Eth port) net-devices that relate to that mlx4 device instance.
> 
> In a system with (say) one card and two Ethernet ports, where for each
> port there's corresponding ethN interface, **both** mlx4_en
> net-devices use the same instance of struct mlx4_device, which means
> that if we let the user to set through ethtool the flow steering hash
> of ethN1 this will evetually change also the hash used for packets
> going to ethN2, or in other words, in mlx4 language this param belong
> to the mlx4_core module. In that respect, I was thinking on using
> sysfs as we do for the port link type and IB mtu, hope this makes
> things clearer, SB the relevant code, now with the global module param
> which can change to using per mlx4_core device sysfs.

I frankly don't care what your special unique situation is.

You cannot create chipset specific interfaces like module parms
and randomly named sysfs files as an interface to configure your
hardware.

Other chipsets will want the same thing or something similar.

So you must create a real generic interface that other chipsets
in similar situations can hook into as well.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01 19:52           ` Joe Perches
@ 2012-07-02 10:19             ` David Laight
  2012-07-02 11:35               ` Andreas Schwab
  0 siblings, 1 reply; 45+ messages in thread
From: David Laight @ 2012-07-02 10:19 UTC (permalink / raw)
  To: Joe Perches, Andreas Schwab
  Cc: Ben Hutchings, Or Gerlitz, davem, roland, yevgenyp, oren, netdev,
	Hadar Hen Zion

 
> > Or write it as (!field || !(typeof(field))~field) which more closely
> > resembles what the macro name expresses.
> 
> Better still, or maybe:
> 
> 	field == 0 || field == (typeof field)~0

Which doesn't work when sizeof(field) > sizeof(int).
Needs another cast.

	field == 0 || field == (typeof field)~(typeof field)0

	David

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-02 10:19             ` David Laight
@ 2012-07-02 11:35               ` Andreas Schwab
  2012-07-02 12:15                 ` David Laight
  2012-07-03  8:14                 ` Or Gerlitz
  0 siblings, 2 replies; 45+ messages in thread
From: Andreas Schwab @ 2012-07-02 11:35 UTC (permalink / raw)
  To: David Laight
  Cc: Joe Perches, Ben Hutchings, Or Gerlitz, davem, roland, yevgenyp,
	oren, netdev, Hadar Hen Zion

"David Laight" <David.Laight@ACULAB.COM> writes:

>  
>> > Or write it as (!field || !(typeof(field))~field) which more closely
>> > resembles what the macro name expresses.
>> 
>> Better still, or maybe:
>> 
>> 	field == 0 || field == (typeof field)~0
>
> Which doesn't work when sizeof(field) > sizeof(int).
> Needs another cast.
>
> 	field == 0 || field == (typeof field)~(typeof field)0

You can avoid that by using (typeof field)-1.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-02 11:35               ` Andreas Schwab
@ 2012-07-02 12:15                 ` David Laight
  2012-07-03  8:14                 ` Or Gerlitz
  1 sibling, 0 replies; 45+ messages in thread
From: David Laight @ 2012-07-02 12:15 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Joe Perches, Ben Hutchings, Or Gerlitz, davem, roland, yevgenyp,
	oren, netdev, Hadar Hen Zion

 
> "David Laight" <David.Laight@ACULAB.COM> writes:
> 
> >  
> >> > Or write it as (!field || !(typeof(field))~field) which more
closely
> >> > resembles what the macro name expresses.
> >> 
> >> Better still, or maybe:
> >> 
> >> 	field == 0 || field == (typeof field)~0
> >
> > Which doesn't work when sizeof(field) > sizeof(int).
> > Needs another cast.
> >
> > 	field == 0 || field == (typeof field)~(typeof field)0
> 
> You can avoid that by using (typeof field)-1.

Gah, I thought I knew the integral promotion rules!
-1 and ~0 are both 'integer' and get treated the same.

A quick test shows that gcc does sign extend when converting
32bit int to 64bit unsigned long long.
Which probably means that is required by the standard!

	David

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01 16:00   ` Ben Hutchings
  2012-07-01 16:38     ` Joe Perches
@ 2012-07-02 12:57     ` Or Gerlitz
  2012-07-03  1:47       ` Ben Hutchings
  2012-07-02 13:32     ` Or Gerlitz
  2012-07-03  9:00     ` Or Gerlitz
  3 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-02 12:57 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, Yevgeny Petrilin, Oren Duer, netdev,
	Hadar Hen Zion, Amir Vadai

On 7/1/2012 7:00 PM, Ben Hutchings wrote:
[...]

Hi Ben,

Thanks for the detailed feedback, see below some responses


> +		l4_mask = &cmd->fs.m_u.tcp_ip4_spec;
> +		/* don't allow mask which isn't all 0 or 1 */
> +		if (not_all_zeros_or_all_ones(l4_mask->ip4src, __be32) ||
> +		    not_all_zeros_or_all_ones(l4_mask->ip4dst, __be32) ||
> +		    not_all_zeros_or_all_ones(l4_mask->psrc, __be16) ||
> +		    not_all_zeros_or_all_ones(l4_mask->pdst, __be16))
> +			return -EOPNOTSUPP;
>
> Again, here and in many further instances, the error code should be EINVAL.


OK, will fix to use EINVAL instead of EOPNOTSUPP here and else-where needed.


> +
> +static void add_ip_rule(struct mlx4_en_priv *priv,
> +			struct ethtool_rxnfc *cmd,
> +			struct list_head *list_h)
> +{
> +	struct mlx4_spec_list *spec_l3;
> +
> +	spec_l3 = kzalloc(sizeof *spec_l3, GFP_KERNEL);
> +	if (!spec_l3) {
> +		en_err(priv, "Fail to alloc ethtool rule.\n");
> +		return;
> +	}
>
> This should return an error code as well - logging is not a substitue.

OK, will do.

>
>
>> +	spec_l3->id = MLX4_NET_TRANS_RULE_ID_IPV4;
>> +	spec_l3->ipv4.src_ip = cmd->fs.h_u.usr_ip4_spec.ip4src;
>> +	if (spec_l3->ipv4.src_ip)
>> +		spec_l3->ipv4.src_ip_msk = EN_ETHTOOL_WORD_MASK;
>> +	spec_l3->ipv4.dst_ip = cmd->fs.h_u.usr_ip4_spec.ip4dst;
>> +	if (spec_l3->ipv4.dst_ip)
>> +		spec_l3->ipv4.dst_ip_msk = EN_ETHTOOL_WORD_MASK;
>
> The conditions should be using the mask (cmd->fs.m_u) not the value.

I'd like to clarify things here a bit - the way the code is written, is to

1st make sure that we can deal with the mask provided by the user in the 
ethtool
command, e.g its all zeroes or all ones (leaving a side for a minute the 
other
discussion on how would be best to impl. that check...) - this check is 
done
in mlx4_en_validate_flow

2nd initialize mlx4 flow spec which is all empty - see calls to kzalloc 
spec_l2/l3/l4

3rd import the non-zero values (not masks) from the ethtool command into 
the mlx4
flow specs, with FULL mask


Under this logic, we can use the values and not the masks, isn't that?



>
>
>> +static void add_tcp_udp_rule(struct mlx4_en_priv *priv,
>> +			     struct ethtool_rxnfc *cmd,
>> +			     struct list_head *list_h, int proto)
>> +{
>> +	struct mlx4_spec_list *spec_l3;
>> +	struct mlx4_spec_list *spec_l4;
>> +
>> +	spec_l3 = kzalloc(sizeof *spec_l3, GFP_KERNEL);
>> +	spec_l4 = kzalloc(sizeof *spec_l4, GFP_KERNEL);
>> +	if (!spec_l4 || !spec_l3) {
>> +		en_err(priv, "Fail to alloc ethtool rule.\n");
>
> If one of them was successfully allocated, it will now be leaked.

THANKS, will fix.

>> +static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
>> +					     struct ethtool_rxnfc *cmd,
>> +					     struct list_head *rule_list_h)
>> +{
>> +	int err;
>> +	u64 mac;
>> +	__be64 be_mac;
>> +	struct ethhdr *eth_spec;
>> +	struct mlx4_en_priv *priv = netdev_priv(dev);
>> +	struct mlx4_spec_list *spec_l2;
>> +	__be64 mac_msk = cpu_to_be64(EN_ETHTOOL_MAC_MASK << 16);
>> +
>> +	err = mlx4_en_validate_flow(dev, cmd);
>> +	if (err)
>> +		return err;
>> +
>> +	spec_l2 = kzalloc(sizeof *spec_l2, GFP_KERNEL);
>> +	if (!spec_l2)
>> +		return -ENOMEM;
>> +
>> +	mac = priv->mac & EN_ETHTOOL_MAC_MASK;
>> +	be_mac = cpu_to_be64(mac << 16);
>> +
>> +	spec_l2->id = MLX4_NET_TRANS_RULE_ID_ETH;
>> +	memcpy(spec_l2->eth.dst_mac_msk, &mac_msk, ETH_ALEN);
>> +	if ((cmd->fs.flow_type & ~FLOW_EXT) != ETHER_FLOW)
>> +		memcpy(spec_l2->eth.dst_mac, &be_mac, ETH_ALEN);
>
> Does the hardware require filtering by MAC address and not only by layer 3/4 addresses?

YES

>
>
>> +	if ((cmd->fs.flow_type & FLOW_EXT) && cmd->fs.m_ext.vlan_tci) {
>> +		spec_l2->eth.vlan_id = cmd->fs.h_ext.vlan_tci;
>> +		spec_l2->eth.vlan_id_msk = cpu_to_be16(0xfff);
>
> This doesn't match mlx4_en_validate_flow(); you are replacing a mask of
> 0xffff with 0xfff.

I need to check how exactly this should be done here, vlan ID is only 12 
bits in size, so this is
probably the source for the 0xfff vs 0xffff


>
>
>> +	switch (cmd->fs.flow_type & ~FLOW_EXT) {
>> +	case ETHER_FLOW:
>> +		eth_spec = &cmd->fs.h_u.ether_spec;
>> +		memcpy(&spec_l2->eth.dst_mac, eth_spec->h_dest, ETH_ALEN);
>> +		spec_l2->eth.ether_type = eth_spec->h_proto;
>> +		if (eth_spec->h_proto)
>> +			spec_l2->eth.ether_type_enable = 1;
>> +		break;
>> +	case IP_USER_FLOW:
>> +		add_ip_rule(priv, cmd, rule_list_h);
>> +		break;
>> +	case TCP_V4_FLOW:
>> +		add_tcp_udp_rule(priv, cmd, rule_list_h, TCP_V4_FLOW);
>> +		break;
>> +	case UDP_V4_FLOW:
>> +		add_tcp_udp_rule(priv, cmd, rule_list_h, UDP_V4_FLOW);
>> +		break;
>
> All those functions need to be able to return errors.


OK

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-02  8:36               ` David Miller
@ 2012-07-02 13:00                 ` Or Gerlitz
  0 siblings, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-02 13:00 UTC (permalink / raw)
  To: David Miller; +Cc: roland, yevgenyp, oren, netdev, hadarh, Amir Vadai

On 7/2/2012 11:36 AM, David Miller wrote:
> So you must create a real generic interface that other chipsets in similar situations can hook into as well.

OK, understood.

We will remove the module param from the patch set, such that at this point
of submission,  there's no run time setting for the flow steering hash 
function.

Or.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01 16:00   ` Ben Hutchings
  2012-07-01 16:38     ` Joe Perches
  2012-07-02 12:57     ` Or Gerlitz
@ 2012-07-02 13:32     ` Or Gerlitz
  2012-07-03  1:50       ` Ben Hutchings
  2012-07-03  9:00     ` Or Gerlitz
  3 siblings, 1 reply; 45+ messages in thread
From: Or Gerlitz @ 2012-07-02 13:32 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, yevgenyp, oren, netdev, Hadar Hen Zion, Amir Vadai

On 7/1/2012 7:00 PM, Ben Hutchings wrote:
>> static int mlx4_en_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
>> >  			     u32 *rule_locs)
>> >  {
>> >  	struct mlx4_en_priv *priv = netdev_priv(dev);
>> >+	struct mlx4_en_dev *mdev = priv->mdev;
>> >  	int err = 0;
>> >+	int i = 0, priority = 0;
>> >+
>> >+	if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_DEVICE_MANAGED)
>> >+		return -EOPNOTSUPP;
>> >  
>> >  	switch (cmd->cmd) {
>> >  	case ETHTOOL_GRXRINGS:
>> >  		cmd->data = priv->rx_ring_num;
>> >  		break;
>> >+	case ETHTOOL_GRXCLSRLCNT:
>> >+		cmd->rule_cnt = mlx4_en_get_num_flows(priv);
>> >+		break;
>> >+	case ETHTOOL_GRXCLSRULE:
>> >+		err = mlx4_en_get_flow(dev, cmd, cmd->fs.location);
>> >+		break;
>> >+	case ETHTOOL_GRXCLSRLALL:
>> >+		while (!err || err == -ENOENT) {
>> >+			err = mlx4_en_get_flow(dev, cmd, i);
>> >+			if (!err)
>> >+				((u32 *)(rule_locs))[priority++] = i;
> I don't see any range check against cmd->rule_cnt.

OK, will fix to make sure we don't cross cmd->rule_cnt

>
>
> Why are you casting rule_locs?

doesn't seem to be needed, will remove that casting

>
>
> Also, are the rules really prioritised by location, so that if rule 0
> and 1 match a packet then only rule 0 is applied?  If they are actually
> prioritised by the match type then you need to assign locations on the
> driver side that reflect the actual priorities.


Rules are prioritized by a scheme made of "domain" X location, see below 
and in patch #6
the MLX4_DOMAIN_yyy entries, higher domain value means lower priority, 
so for instance rules
placed by ethtool would have higher priority over rules added by the L2 
NIC  or by future RFS
patch. Within a domain, the location matters.

You can see that simple L2 rules (e.g contain dest-mac, possibly vlan) 
added by the driver
use the NIC domain, wheres those added to serve ethtool commands use the 
ETHTOOL one.

Within the ethtool domain, we maintain the priority as the location 
specified by the user, hope this explains
things.

> +enum {
> +	MLX4_DOMAIN_UVERBS	= 0x1000,
> +	MLX4_DOMAIN_ETHTOOL     = 0x2000,
> +	MLX4_DOMAIN_RFS         = 0x3000,
> +	MLX4_DOMAIN_NIC    = 0x5000,
> +};

>> >+			i++;
>> >+		}
>> >+		if (priority)
>> >+			err = 0;
> [...]
>
> But if there are no rules defined, this is an error?  That's not right.
> I think you should unconditionally set err = 0 here.

OK, will do.

Or.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-02  8:34           ` David Miller
@ 2012-07-02 18:07             ` Ben Hutchings
  2012-07-03  0:15               ` David Miller
  0 siblings, 1 reply; 45+ messages in thread
From: Ben Hutchings @ 2012-07-02 18:07 UTC (permalink / raw)
  To: David Miller; +Cc: ogerlitz, roland, yevgenyp, oren, netdev, hadarh

On Mon, 2012-07-02 at 01:34 -0700, David Miller wrote:
> From: Or Gerlitz <ogerlitz@mellanox.com>
> Date: Mon, 2 Jul 2012 10:55:28 +0300
> 
> > On 7/2/2012 12:42 AM, David Miller wrote:
> >> [...] Module parameters stink because every driver is going to provide
> >> the knob differently, with a different name, and different
> >> semantics. This creates a terrible user experience, and I will not
> >> allow it.
> > 
> > OK, so if looking on what we are left with on the table, seems that
> > sysfs entry on the mlx4_core
> > level (as we do for the port link type {IB, Eth} or IB port MTU) could
> > be fine here, Roland, agree?
> 
> No way.
> 
> You have to create a real interface, that other vendors with similar
> chips can consistently use.

But there may not be enough commonality to define a non- vendor-specific
API.  And ethtool really isn't a good way to expose parameters that are
per-controller rather than per-net-device, particularly if changing them
may disrupt all running net devices on that controller and not just the
one used to invoke SIOCETHTOOL.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-02 18:07             ` Ben Hutchings
@ 2012-07-03  0:15               ` David Miller
  2012-07-03  1:04                 ` David Miller
  0 siblings, 1 reply; 45+ messages in thread
From: David Miller @ 2012-07-03  0:15 UTC (permalink / raw)
  To: bhutchings; +Cc: ogerlitz, roland, yevgenyp, oren, netdev, hadarh

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 2 Jul 2012 19:07:25 +0100

> But there may not be enough commonality to define a non- vendor-specific
> API.  And ethtool really isn't a good way to expose parameters that are
> per-controller rather than per-net-device, particularly if changing them
> may disrupt all running net devices on that controller and not just the
> one used to invoke SIOCETHTOOL.

I fundamentally disagree with you.

Are you really saying that it's OK for every damn vendor to define
their own magic knob to control stuff like this?  Surely you're not.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-03  0:15               ` David Miller
@ 2012-07-03  1:04                 ` David Miller
  2012-07-03 11:10                   ` Or Gerlitz
  0 siblings, 1 reply; 45+ messages in thread
From: David Miller @ 2012-07-03  1:04 UTC (permalink / raw)
  To: bhutchings; +Cc: ogerlitz, roland, yevgenyp, oren, netdev, hadarh


Just in case you guys _really_ and _truly_ are so unable to think
outside the box that you can't come up with something reasonable, I'll
get you started with two ideas:

1) A special "chipset" dummy netdev that a special class of ethtool
   commands can run on to set these things.

2) A "chipset" genetlink family with suitable operations and
   attributes.

In both cases appropriate mechanisms are added to make for keys that
are used for chipset matching, and device drivers simply register
a notifier handler that is called on two occaisions:

1) When settings are changed.

2) Upon initial handler registry, to acquire the initial settings.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-02 12:57     ` Or Gerlitz
@ 2012-07-03  1:47       ` Ben Hutchings
  2012-07-03  8:56         ` Or Gerlitz
  2012-07-03  8:58         ` Or Gerlitz
  0 siblings, 2 replies; 45+ messages in thread
From: Ben Hutchings @ 2012-07-03  1:47 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: davem, roland, Yevgeny Petrilin, Oren Duer, netdev,
	Hadar Hen Zion, Amir Vadai

On Mon, 2012-07-02 at 15:57 +0300, Or Gerlitz wrote:
> On 7/1/2012 7:00 PM, Ben Hutchings wrote:
[...]
> >> +	spec_l3->id = MLX4_NET_TRANS_RULE_ID_IPV4;
> >> +	spec_l3->ipv4.src_ip = cmd->fs.h_u.usr_ip4_spec.ip4src;
> >> +	if (spec_l3->ipv4.src_ip)
> >> +		spec_l3->ipv4.src_ip_msk = EN_ETHTOOL_WORD_MASK;
> >> +	spec_l3->ipv4.dst_ip = cmd->fs.h_u.usr_ip4_spec.ip4dst;
> >> +	if (spec_l3->ipv4.dst_ip)
> >> +		spec_l3->ipv4.dst_ip_msk = EN_ETHTOOL_WORD_MASK;
> >
> > The conditions should be using the mask (cmd->fs.m_u) not the value.
> 
> I'd like to clarify things here a bit - the way the code is written, is to
> 
> 1st make sure that we can deal with the mask provided by the user in the 
> ethtool
> command, e.g its all zeroes or all ones (leaving a side for a minute the 
> other
> discussion on how would be best to impl. that check...) - this check is 
> done
> in mlx4_en_validate_flow
> 
> 2nd initialize mlx4 flow spec which is all empty - see calls to kzalloc 
> spec_l2/l3/l4
> 
> 3rd import the non-zero values (not masks) from the ethtool command into 
> the mlx4
> flow specs, with FULL mask
> 
> 
> Under this logic, we can use the values and not the masks, isn't that?

No, it's perfectly valid to specify a filter that matches, for example,
a destination IP address of 0.0.0.0 with mask of 255.255.255.255.  So
you really need to check the mask.  If your filter hardware doesn't
support zero values for some fields then you'll need to reject them in
mlx4_en_validate_flow.

[...]
> >> +static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
> >> +					     struct ethtool_rxnfc *cmd,
> >> +					     struct list_head *rule_list_h)
> >> +{
> >> +	int err;
> >> +	u64 mac;
> >> +	__be64 be_mac;
> >> +	struct ethhdr *eth_spec;
> >> +	struct mlx4_en_priv *priv = netdev_priv(dev);
> >> +	struct mlx4_spec_list *spec_l2;
> >> +	__be64 mac_msk = cpu_to_be64(EN_ETHTOOL_MAC_MASK << 16);
> >> +
> >> +	err = mlx4_en_validate_flow(dev, cmd);
> >> +	if (err)
> >> +		return err;
> >> +
> >> +	spec_l2 = kzalloc(sizeof *spec_l2, GFP_KERNEL);
> >> +	if (!spec_l2)
> >> +		return -ENOMEM;
> >> +
> >> +	mac = priv->mac & EN_ETHTOOL_MAC_MASK;
> >> +	be_mac = cpu_to_be64(mac << 16);
> >> +
> >> +	spec_l2->id = MLX4_NET_TRANS_RULE_ID_ETH;
> >> +	memcpy(spec_l2->eth.dst_mac_msk, &mac_msk, ETH_ALEN);
> >> +	if ((cmd->fs.flow_type & ~FLOW_EXT) != ETHER_FLOW)
> >> +		memcpy(spec_l2->eth.dst_mac, &be_mac, ETH_ALEN);
> >
> > Does the hardware require filtering by MAC address and not only by layer 3/4 addresses?
> 
> YES

That's a pity.  Maybe the API should be extended so the driver can at
least report that the filter is narrower than requested.

> >> +	if ((cmd->fs.flow_type & FLOW_EXT) && cmd->fs.m_ext.vlan_tci) {
> >> +		spec_l2->eth.vlan_id = cmd->fs.h_ext.vlan_tci;
> >> +		spec_l2->eth.vlan_id_msk = cpu_to_be16(0xfff);
> >
> > This doesn't match mlx4_en_validate_flow(); you are replacing a mask of
> > 0xffff with 0xfff.
> 
> I need to check how exactly this should be done here, vlan ID is only 12 
> bits in size, so this is
> probably the source for the 0xfff vs 0xffff
[...]

If the hardware can only match the VID then you need to validate that
the mask is either 0 or 0xfff instead of 0 or 0xffff.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-02 13:32     ` Or Gerlitz
@ 2012-07-03  1:50       ` Ben Hutchings
  0 siblings, 0 replies; 45+ messages in thread
From: Ben Hutchings @ 2012-07-03  1:50 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: davem, roland, yevgenyp, oren, netdev, Hadar Hen Zion, Amir Vadai

On Mon, 2012-07-02 at 16:32 +0300, Or Gerlitz wrote:
[...]
> > Also, are the rules really prioritised by location, so that if rule 0
> > and 1 match a packet then only rule 0 is applied?  If they are actually
> > prioritised by the match type then you need to assign locations on the
> > driver side that reflect the actual priorities.
> 
> 
> Rules are prioritized by a scheme made of "domain" X location, see below 
> and in patch #6
> the MLX4_DOMAIN_yyy entries, higher domain value means lower priority, 
> so for instance rules
> placed by ethtool would have higher priority over rules added by the L2 
> NIC  or by future RFS
> patch. Within a domain, the location matters.
> 
> You can see that simple L2 rules (e.g contain dest-mac, possibly vlan) 
> added by the driver
> use the NIC domain, wheres those added to serve ethtool commands use the 
> ETHTOOL one.
> 
> Within the ethtool domain, we maintain the priority as the location 
> specified by the user, hope this explains
> things.
[...]

Good, I didn't read the other patches but just wanted to make sure you
had thought about this.

Ben

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-02 11:35               ` Andreas Schwab
  2012-07-02 12:15                 ` David Laight
@ 2012-07-03  8:14                 ` Or Gerlitz
  1 sibling, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-03  8:14 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: David Laight, Joe Perches, Ben Hutchings, davem, roland,
	yevgenyp, oren, netdev, Hadar Hen Zion, Amir Vadai

On 7/2/2012 2:35 PM, Andreas Schwab wrote:
>
> 	field == 0 || field == (typeof field)~(typeof field)0
> You can avoid that by using (typeof field)-1.
>

OK, thanks everybody, we will take that path.

Or.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-03  1:47       ` Ben Hutchings
@ 2012-07-03  8:56         ` Or Gerlitz
  2012-07-03  8:58         ` Or Gerlitz
  1 sibling, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-03  8:56 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, Yevgeny Petrilin, Oren Duer, netdev,
	Hadar Hen Zion, Amir Vadai

On 7/3/2012 4:47 AM, Ben Hutchings wrote:
> If the hardware can only match the VID then you need to validate that
> the mask is either 0 or 0xfff instead of 0 or 0xffff.

sure, will fix

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-03  1:47       ` Ben Hutchings
  2012-07-03  8:56         ` Or Gerlitz
@ 2012-07-03  8:58         ` Or Gerlitz
  1 sibling, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-03  8:58 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, Yevgeny Petrilin, Oren Duer, netdev,
	Hadar Hen Zion, Amir Vadai

On 7/3/2012 4:47 AM, Ben Hutchings wrote:
>> Under this logic, we can use the values and not the masks, isn't that?
> No, it's perfectly valid to specify a filter that matches, for example,
> a destination IP address of 0.0.0.0 with mask of 255.255.255.255.  So
> you really need to check the mask.  If your filter hardware doesn't
> support zero values for some fields then you'll need to reject them in
> mlx4_en_validate_flow.

Got it, will change to use masks all over the place, as you pointed out 
we need to do.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
  2012-07-01 16:00   ` Ben Hutchings
                       ` (2 preceding siblings ...)
  2012-07-02 13:32     ` Or Gerlitz
@ 2012-07-03  9:00     ` Or Gerlitz
  3 siblings, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-03  9:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, yevgenyp, oren, netdev, Hadar Hen Zion, Amir Vadai

On 7/1/2012 7:00 PM, Ben Hutchings wrote:
>> +#define not_all_zeros_or_all_ones(field, type) \
>> >+	(field && (type)~field)
>> >+
>> >+static int mlx4_en_validate_flow(struct net_device *dev,
>> >+				 struct ethtool_rxnfc *cmd)
>> >+{
>> >+	struct ethtool_usrip4_spec *l3_mask;
>> >+	struct ethtool_tcpip4_spec *l4_mask;
>> >+	struct ethhdr *eth_mask;
>> >+	u64 full_mac = ~0ull;
>> >+	u64 zero_mac = 0;
>> >+
>> >+	if (cmd->fs.location >= MAX_NUM_OF_FS_RULES)
>> >+		return -EINVAL;
>> >+
>> >+	switch (cmd->fs.flow_type & ~FLOW_EXT) {
>> >+	case TCP_V4_FLOW:
>> >+	case UDP_V4_FLOW:
>> >+		if (cmd->fs.h_u.tcp_ip4_spec.tos)
>> >+			return -EOPNOTSUPP;
> I suspect that your filter ignores TOS, rather than only matching TOS ==
> 0, so you should actually be checking the corresponding field in the
> mask (fs.m_u). [...]

OK, thanks for pointing this over, will fix.

> >+		break;
> >+	case IP_USER_FLOW:
> >+		l3_mask = &cmd->fs.m_u.usr_ip4_spec;
> >+		if (cmd->fs.h_u.usr_ip4_spec.l4_4_bytes ||
> >+		    cmd->fs.h_u.usr_ip4_spec.tos ||
> I think this should be checking l4_4_bytes and tos in the mask.

OK

>
>> >+		    cmd->fs.h_u.usr_ip4_spec.proto ||
>> >+		    cmd->fs.h_u.usr_ip4_spec.ip_ver != ETH_RX_NFC_IP4 ||
>> >+		    (!cmd->fs.h_u.usr_ip4_spec.ip4src &&
>> >+		     !cmd->fs.h_u.usr_ip4_spec.ip4dst) ||
>> >+		    not_all_zeros_or_all_ones(l3_mask->ip4src, __be32) ||
>> >+		    not_all_zeros_or_all_ones(l3_mask->ip4dst, __be32))
>> >+			return -EOPNOTSUPP;
>> >+		break;
>> >+	case ETHER_FLOW:
>> >+		eth_mask = &cmd->fs.m_u.ether_spec;
>> >+		if (memcmp(eth_mask->h_source, &zero_mac, ETH_ALEN))
>> >+			return -EOPNOTSUPP;
>> >+		if (!memcmp(eth_mask->h_dest, &zero_mac, ETH_ALEN))
>> >+			return -EOPNOTSUPP;
> But in the next statement you test whether eth_mask->h_dest is either
> all-zeroes or all-ones.  Is all-zeroes valid or not?  I suspect you
> actually intend to reject the case where both h_dest and h_proto are masked out.

indeed, this code section can be better written, will fix for V1


>
>> >+		if (not_all_zeros_or_all_ones(eth_mask->h_proto, __be16) ||
>> >+		    (memcmp(eth_mask->h_dest, &zero_mac, ETH_ALEN) &&
>> >+		     memcmp(eth_mask->h_dest, &full_mac, ETH_ALEN)))
>> >+			return -EOPNOTSUPP;
>> >+		break;
>> >+	default:
>> >+		return -EOPNOTSUPP;
>> >+	}
>> >+
>> >+	if ((cmd->fs.flow_type & FLOW_EXT)) {
>> >+		if (cmd->fs.m_ext.vlan_etype ||
>> >+		    not_all_zeros_or_all_ones(cmd->fs.m_ext.vlan_tci,
>> >+					       __be16)) {
>> >+			return -EOPNOTSUPP;
>> >+		}
>> >+	}
>> >+
>> >+	return 0;
>> >+}

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
  2012-07-03  1:04                 ` David Miller
@ 2012-07-03 11:10                   ` Or Gerlitz
  0 siblings, 0 replies; 45+ messages in thread
From: Or Gerlitz @ 2012-07-03 11:10 UTC (permalink / raw)
  To: David Miller
  Cc: bhutchings, roland, yevgenyp, oren, netdev, hadarh, Amir Vadai

On 7/3/2012 4:04 AM, David Miller wrote:
>
> Just in case you guys _really_ and _truly_ are so unable to think
> outside the box that you can't come up with something reasonable, I'll
> get you started with two ideas:
>
> 1) A special "chipset" dummy netdev that a special class of ethtool
>     commands can run on to set these things.
>
> 2) A "chipset" genetlink family with suitable operations and
>     attributes.

Dave,

Thanks for trying to address the need here, as I wrote you, we've 
removed the module param
from the patch-set and will submit V1 without this. Once the comments 
are over and hopefully
the patches are accepted, we'll see what can/need to be done for 
allowing that flexibility.

Or.

>
>
> In both cases appropriate mechanisms are added to make for keys that
> are used for chipset matching, and device drivers simply register
> a notifier handler that is called on two occaisions:
>
> 1) When settings are changed.
>
> 2) Upon initial handler registry, to acquire the initial settings.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2012-07-03 11:10 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-01  9:43 [PATCH net-next 00/10] net/mlx4: Add flow-steering support Or Gerlitz
2012-07-01  9:43 ` [PATCH net-next 01/10] net/mlx4_core: Change resource tracking mechanism to use red-black tree Or Gerlitz
2012-07-01 10:17   ` David Miller
2012-07-01  9:43 ` [PATCH net-next 02/10] net/mlx4_core: Change resource tracking ID to be 64 bit Or Gerlitz
2012-07-01  9:43 ` [PATCH net-next 03/10] net/mlx4_en: Re-design multicast attachments flow Or Gerlitz
2012-07-01 10:32   ` David Miller
2012-07-01  9:43 ` [PATCH net-next 04/10] net/mlx4: Set steering mode according to device capabilities Or Gerlitz
2012-07-01 10:20   ` David Miller
2012-07-01  9:43 ` [PATCH net-next 05/10] net/mlx4_core: Add firmware commands to support device managed flow steering Or Gerlitz
2012-07-01  9:43 ` [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API Or Gerlitz
2012-07-01 10:30   ` David Miller
2012-07-01 12:29     ` Or Gerlitz
2012-07-01 21:42       ` David Miller
2012-07-02  7:55         ` Or Gerlitz
2012-07-02  8:14           ` Roland Dreier
2012-07-02  8:28             ` Or Gerlitz
2012-07-02  8:36               ` David Miller
2012-07-02 13:00                 ` Or Gerlitz
2012-07-02  8:34           ` David Miller
2012-07-02 18:07             ` Ben Hutchings
2012-07-03  0:15               ` David Miller
2012-07-03  1:04                 ` David Miller
2012-07-03 11:10                   ` Or Gerlitz
2012-07-01  9:43 ` [PATCH net-next 07/10] net/mlx4_core: Add resource tracking for device managed flow steering rules Or Gerlitz
2012-07-01 10:22   ` David Miller
2012-07-01  9:43 ` [PATCH net-next 08/10] net/mlx4: Implement promiscuous mode with device managed flow-steering Or Gerlitz
2012-07-01  9:43 ` [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool Or Gerlitz
2012-07-01 10:23   ` David Miller
2012-07-01 16:00   ` Ben Hutchings
2012-07-01 16:38     ` Joe Perches
2012-07-01 17:31       ` Joe Perches
2012-07-01 18:48         ` Andreas Schwab
2012-07-01 19:52           ` Joe Perches
2012-07-02 10:19             ` David Laight
2012-07-02 11:35               ` Andreas Schwab
2012-07-02 12:15                 ` David Laight
2012-07-03  8:14                 ` Or Gerlitz
2012-07-02 12:57     ` Or Gerlitz
2012-07-03  1:47       ` Ben Hutchings
2012-07-03  8:56         ` Or Gerlitz
2012-07-03  8:58         ` Or Gerlitz
2012-07-02 13:32     ` Or Gerlitz
2012-07-03  1:50       ` Ben Hutchings
2012-07-03  9:00     ` Or Gerlitz
2012-07-01  9:43 ` [PATCH net-next 10/10] net/mlx4_en: Add support for drop action through ethtool Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.