All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support
@ 2012-07-18 14:19 Or Gerlitz
  2012-07-18 14:19 ` [PATCH net-next 1/4] net/mlx4: Move MAC_MASK to a common place Or Gerlitz
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Or Gerlitz @ 2012-07-18 14:19 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, oren, yevgenyp, Or Gerlitz, Amir Vadai

Hi Dave, 

So now a pure Ethernet post from us...

This series from Amir Vadai adds support for Accelerated RFS 
to the mlx4_en Ethernet driver.

The code uses the Accelerated RFS infrastructure and HW flow steering 
to keep CPU affinity of rx interrupts and applications per TCP stream.

To do so, we had to add little protection to cpu_rmap.h against double 
inclusion. Also, added linking between CPU to IRQ using rmap in the 
mlx4_core driver.

Or.

Amir Vadai (4):
  net/mlx4: Move MAC_MASK to a common place
  net/rps: Protect cpu_rmap.h from double inclusion
  {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq
  net/mlx4_en: Add accelerated RFS support

 drivers/infiniband/hw/mlx4/main.c                  |    3 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c         |    9 +-
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c    |    6 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |  316 ++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |    3 +
 drivers/net/ethernet/mellanox/mlx4/eq.c            |   12 +-
 drivers/net/ethernet/mellanox/mlx4/mcg.c           |    1 -
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |   16 +
 drivers/net/ethernet/mellanox/mlx4/port.c          |    1 -
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |    3 +-
 include/linux/cpu_rmap.h                           |    4 +
 include/linux/mlx4/device.h                        |    4 +-
 include/linux/mlx4/driver.h                        |    2 +
 13 files changed, 369 insertions(+), 11 deletions(-)

CC: Amir Vadai <amirv@mellanox.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/4] net/mlx4: Move MAC_MASK to a common place
  2012-07-18 14:19 [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
@ 2012-07-18 14:19 ` Or Gerlitz
  2012-07-18 14:19 ` [PATCH net-next 2/4] net/rps: Protect cpu_rmap.h from double inclusion Or Gerlitz
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Or Gerlitz @ 2012-07-18 14:19 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, oren, yevgenyp, Amir Vadai, Or Gerlitz

From: Amir Vadai <amirv@mellanox.com>

Define this macro is one common place instead of duplicating it over the code

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c    |    6 +++---
 drivers/net/ethernet/mellanox/mlx4/mcg.c           |    1 -
 drivers/net/ethernet/mellanox/mlx4/port.c          |    1 -
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |    3 +--
 include/linux/mlx4/driver.h                        |    2 ++
 5 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index dd6a77b..9d0b88e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -34,12 +34,12 @@
 #include <linux/kernel.h>
 #include <linux/ethtool.h>
 #include <linux/netdevice.h>
+#include <linux/mlx4/driver.h>
 
 #include "mlx4_en.h"
 #include "en_port.h"
 
 #define EN_ETHTOOL_QP_ATTACH (1ull << 63)
-#define EN_ETHTOOL_MAC_MASK 0xffffffffffffULL
 #define EN_ETHTOOL_SHORT_MASK cpu_to_be16(0xffff)
 #define EN_ETHTOOL_WORD_MASK  cpu_to_be32(0xffffffff)
 
@@ -751,7 +751,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
 	struct ethhdr *eth_spec;
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_spec_list *spec_l2;
-	__be64 mac_msk = cpu_to_be64(EN_ETHTOOL_MAC_MASK << 16);
+	__be64 mac_msk = cpu_to_be64(MLX4_MAC_MASK << 16);
 
 	err = mlx4_en_validate_flow(dev, cmd);
 	if (err)
@@ -761,7 +761,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
 	if (!spec_l2)
 		return -ENOMEM;
 
-	mac = priv->mac & EN_ETHTOOL_MAC_MASK;
+	mac = priv->mac & MLX4_MAC_MASK;
 	be_mac = cpu_to_be64(mac << 16);
 
 	spec_l2->id = MLX4_NET_TRANS_RULE_ID_ETH;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index 5bac0df..4ec3835 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -41,7 +41,6 @@
 
 #define MGM_QPN_MASK       0x00FFFFFF
 #define MGM_BLCK_LB_BIT    30
-#define MLX4_MAC_MASK	   0xffffffffffffULL
 
 static const u8 zero_gid[16];	/* automatically initialized to 0 */
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index a51d1b9..028833f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -39,7 +39,6 @@
 #include "mlx4.h"
 
 #define MLX4_MAC_VALID		(1ull << 63)
-#define MLX4_MAC_MASK		0xffffffffffffULL
 
 #define MLX4_VLAN_VALID		(1u << 31)
 #define MLX4_VLAN_MASK		0xfff
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index c3fa919..94ceddd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -41,13 +41,12 @@
 #include <linux/slab.h>
 #include <linux/mlx4/cmd.h>
 #include <linux/mlx4/qp.h>
+#include <linux/if_ether.h>
 
 #include "mlx4.h"
 #include "fw.h"
 
 #define MLX4_MAC_VALID		(1ull << 63)
-#define MLX4_MAC_MASK		0x7fffffffffffffffULL
-#define ETH_ALEN		6
 
 struct mac_res {
 	struct list_head list;
diff --git a/include/linux/mlx4/driver.h b/include/linux/mlx4/driver.h
index 5f1298b..8dc485f 100644
--- a/include/linux/mlx4/driver.h
+++ b/include/linux/mlx4/driver.h
@@ -37,6 +37,8 @@
 
 struct mlx4_dev;
 
+#define MLX4_MAC_MASK	   0xffffffffffffULL
+
 enum mlx4_dev_event {
 	MLX4_DEV_EVENT_CATASTROPHIC_ERROR,
 	MLX4_DEV_EVENT_PORT_UP,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/4] net/rps: Protect cpu_rmap.h from double inclusion
  2012-07-18 14:19 [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
  2012-07-18 14:19 ` [PATCH net-next 1/4] net/mlx4: Move MAC_MASK to a common place Or Gerlitz
@ 2012-07-18 14:19 ` Or Gerlitz
  2012-07-18 14:19 ` [PATCH net-next 3/4] {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq Or Gerlitz
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Or Gerlitz @ 2012-07-18 14:19 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, oren, yevgenyp, Amir Vadai, Or Gerlitz

From: Amir Vadai <amirv@mellanox.com>

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/linux/cpu_rmap.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h
index 473771a..ac3bbb5 100644
--- a/include/linux/cpu_rmap.h
+++ b/include/linux/cpu_rmap.h
@@ -1,3 +1,6 @@
+#ifndef __LINUX_CPU_RMAP_H
+#define __LINUX_CPU_RMAP_H
+
 /*
  * cpu_rmap.c: CPU affinity reverse-map support
  * Copyright 2011 Solarflare Communications Inc.
@@ -71,3 +74,4 @@ extern void free_irq_cpu_rmap(struct cpu_rmap *rmap);
 extern int irq_cpu_rmap_add(struct cpu_rmap *rmap, int irq);
 
 #endif
+#endif /* __LINUX_CPU_RMAP_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/4] {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq
  2012-07-18 14:19 [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
  2012-07-18 14:19 ` [PATCH net-next 1/4] net/mlx4: Move MAC_MASK to a common place Or Gerlitz
  2012-07-18 14:19 ` [PATCH net-next 2/4] net/rps: Protect cpu_rmap.h from double inclusion Or Gerlitz
@ 2012-07-18 14:19 ` Or Gerlitz
  2012-07-18 14:19 ` [PATCH net-next 4/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
  2012-07-18 18:41 ` [PATCH net-next 0/4] " David Miller
  4 siblings, 0 replies; 10+ messages in thread
From: Or Gerlitz @ 2012-07-18 14:19 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, oren, yevgenyp, Amir Vadai, Or Gerlitz

From: Amir Vadai <amirv@mellanox.com>

Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap.
If supplied, the assigned IRQ is tracked using rmap infrastructure.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/infiniband/hw/mlx4/main.c          |    3 ++-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c |    3 ++-
 drivers/net/ethernet/mellanox/mlx4/eq.c    |   12 +++++++++++-
 include/linux/mlx4/device.h                |    4 +++-
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 8a3a203..a07b774 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1159,7 +1159,8 @@ static void mlx4_ib_alloc_eqs(struct mlx4_dev *dev, struct mlx4_ib_dev *ibdev)
 			sprintf(name, "mlx4-ib-%d-%d@%s",
 				i, j, dev->pdev->bus->name);
 			/* Set IRQ for specific name (per ring) */
-			if (mlx4_assign_eq(dev, name, &ibdev->eq_table[eq])) {
+			if (mlx4_assign_eq(dev, name, NULL,
+					   &ibdev->eq_table[eq])) {
 				/* Use legacy (same as mlx4_en driver) */
 				pr_warn("Can't allocate EQ %d; reverting to legacy\n", eq);
 				ibdev->eq_table[eq] =
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 908a460..0ef6156 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -91,7 +91,8 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 				sprintf(name, "%s-%d", priv->dev->name,
 					cq->ring);
 				/* Set IRQ for specific name (per ring) */
-				if (mlx4_assign_eq(mdev->dev, name, &cq->vector)) {
+				if (mlx4_assign_eq(mdev->dev, name, NULL,
+						   &cq->vector)) {
 					cq->vector = (cq->ring + 1 + priv->port)
 					    % mdev->dev->caps.num_comp_vectors;
 					mlx4_warn(mdev, "Failed Assigning an EQ to "
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
index bce98d9..12c3ed2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -39,6 +39,7 @@
 #include <linux/dma-mapping.h>
 
 #include <linux/mlx4/cmd.h>
+#include <linux/cpu_rmap.h>
 
 #include "mlx4.h"
 #include "fw.h"
@@ -1060,7 +1061,8 @@ int mlx4_test_interrupts(struct mlx4_dev *dev)
 }
 EXPORT_SYMBOL(mlx4_test_interrupts);
 
-int mlx4_assign_eq(struct mlx4_dev *dev, char* name, int * vector)
+int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
+		   int *vector)
 {
 
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -1074,6 +1076,14 @@ int mlx4_assign_eq(struct mlx4_dev *dev, char* name, int * vector)
 			snprintf(priv->eq_table.irq_names +
 					vec * MLX4_IRQNAME_SIZE,
 					MLX4_IRQNAME_SIZE, "%s", name);
+#ifdef CONFIG_CPU_RMAP
+			if (rmap) {
+				err = irq_cpu_rmap_add(rmap,
+						       priv->eq_table.eq[vec].irq);
+				if (err)
+					mlx4_warn(dev, "Failed adding irq rmap\n");
+			}
+#endif
 			err = request_irq(priv->eq_table.eq[vec].irq,
 					  mlx4_msi_x_interrupt, 0,
 					  &priv->eq_table.irq_names[vec<<5],
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 6f0d133..4d7761f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -36,6 +36,7 @@
 #include <linux/pci.h>
 #include <linux/completion.h>
 #include <linux/radix-tree.h>
+#include <linux/cpu_rmap.h>
 
 #include <linux/atomic.h>
 
@@ -784,7 +785,8 @@ void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
 int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);
 int mlx4_test_interrupts(struct mlx4_dev *dev);
-int mlx4_assign_eq(struct mlx4_dev *dev, char* name , int* vector);
+int mlx4_assign_eq(struct mlx4_dev *dev, char *name, struct cpu_rmap *rmap,
+		   int *vector);
 void mlx4_release_eq(struct mlx4_dev *dev, int vec);
 
 int mlx4_wol_read(struct mlx4_dev *dev, u64 *config, int port);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/4] net/mlx4_en: Add accelerated RFS support
  2012-07-18 14:19 [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
                   ` (2 preceding siblings ...)
  2012-07-18 14:19 ` [PATCH net-next 3/4] {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq Or Gerlitz
@ 2012-07-18 14:19 ` Or Gerlitz
  2012-07-18 18:41 ` [PATCH net-next 0/4] " David Miller
  4 siblings, 0 replies; 10+ messages in thread
From: Or Gerlitz @ 2012-07-18 14:19 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, oren, yevgenyp, Amir Vadai, Or Gerlitz

From: Amir Vadai <amirv@mellanox.com>

Use RFS infrastructure and flow steering in HW to keep CPU
affinity of rx interrupts and application per TCP stream.

A flow steering filter is added to the HW whenever the RFS
ndo callback is invoked by core networking code.

Because the invocation takes place in interrupt context, the
actual setup of HW is done using workqueue. Whenever new filter
is added, the driver checks for expiry of existing filters.

Since there's window in time between the point where the core
RFS code invoked the ndo callback, to the point where the HW
is configured from the workqueue context, the 2nd, 3rd etc
packets from that stream will cause the net core to invoke
the callback again and again.

To prevent inefficient/double configuration of the HW, the filters
are kept in a database which is indexed using hash function to enable
fast access.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_cq.c     |    8 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  316 ++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |    3 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |   16 ++
 4 files changed, 342 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
index 0ef6156..2d6f1ba 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c
@@ -77,6 +77,12 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 	struct mlx4_en_dev *mdev = priv->mdev;
 	int err = 0;
 	char name[25];
+	struct cpu_rmap *rmap =
+#ifdef CONFIG_CPU_RMAP
+		priv->dev->rx_cpu_rmap;
+#else
+		NULL;
+#endif
 
 	cq->dev = mdev->pndev[priv->port];
 	cq->mcq.set_ci_db  = cq->wqres.db.db;
@@ -91,7 +97,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq,
 				sprintf(name, "%s-%d", priv->dev->name,
 					cq->ring);
 				/* Set IRQ for specific name (per ring) */
-				if (mlx4_assign_eq(mdev->dev, name, NULL,
+				if (mlx4_assign_eq(mdev->dev, name, rmap,
 						   &cq->vector)) {
 					cq->vector = (cq->ring + 1 + priv->port)
 					    % mdev->dev->caps.num_comp_vectors;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 4ce5ca8..8864d8b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -36,6 +36,8 @@
 #include <linux/if_vlan.h>
 #include <linux/delay.h>
 #include <linux/slab.h>
+#include <linux/hash.h>
+#include <net/ip.h>
 
 #include <linux/mlx4/driver.h>
 #include <linux/mlx4/device.h>
@@ -66,6 +68,299 @@ static int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 	return 0;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+
+struct mlx4_en_filter {
+	struct list_head next;
+	struct work_struct work;
+
+	__be32 src_ip;
+	__be32 dst_ip;
+	__be16 src_port;
+	__be16 dst_port;
+
+	int rxq_index;
+	struct mlx4_en_priv *priv;
+	u32 flow_id;			/* RFS infrastructure id */
+	int id;				/* mlx4_en driver id */
+	u64 reg_id;			/* Flow steering API id */
+	u8 activated;			/* Used to prevent expiry before filter
+					 * is attached
+					 */
+	struct hlist_node filter_chain;
+};
+
+static void mlx4_en_filter_rfs_expire(struct mlx4_en_priv *priv);
+
+static void mlx4_en_filter_work(struct work_struct *work)
+{
+	struct mlx4_en_filter *filter = container_of(work,
+						     struct mlx4_en_filter,
+						     work);
+	struct mlx4_en_priv *priv = filter->priv;
+	struct mlx4_spec_list spec_tcp = {
+		.id = MLX4_NET_TRANS_RULE_ID_TCP,
+		{
+			.tcp_udp = {
+				.dst_port = filter->dst_port,
+				.dst_port_msk = (__force __be16)-1,
+				.src_port = filter->src_port,
+				.src_port_msk = (__force __be16)-1,
+			},
+		},
+	};
+	struct mlx4_spec_list spec_ip = {
+		.id = MLX4_NET_TRANS_RULE_ID_IPV4,
+		{
+			.ipv4 = {
+				.dst_ip = filter->dst_ip,
+				.dst_ip_msk = (__force __be32)-1,
+				.src_ip = filter->src_ip,
+				.src_ip_msk = (__force __be32)-1,
+			},
+		},
+	};
+	struct mlx4_spec_list spec_eth = {
+		.id = MLX4_NET_TRANS_RULE_ID_ETH,
+	};
+	struct mlx4_net_trans_rule rule = {
+		.list = LIST_HEAD_INIT(rule.list),
+		.queue_mode = MLX4_NET_TRANS_Q_LIFO,
+		.exclusive = 1,
+		.allow_loopback = 1,
+		.promisc_mode = MLX4_FS_PROMISC_NONE,
+		.port = priv->port,
+		.priority = MLX4_DOMAIN_RFS,
+	};
+	int rc;
+	__be64 mac;
+	__be64 mac_mask = cpu_to_be64(MLX4_MAC_MASK << 16);
+
+	list_add_tail(&spec_eth.list, &rule.list);
+	list_add_tail(&spec_ip.list, &rule.list);
+	list_add_tail(&spec_tcp.list, &rule.list);
+
+	mac = cpu_to_be64((priv->mac & MLX4_MAC_MASK) << 16);
+
+	rule.qpn = priv->rss_map.qps[filter->rxq_index].qpn;
+	memcpy(spec_eth.eth.dst_mac, &mac, ETH_ALEN);
+	memcpy(spec_eth.eth.dst_mac_msk, &mac_mask, ETH_ALEN);
+
+	filter->activated = 0;
+
+	if (filter->reg_id) {
+		rc = mlx4_flow_detach(priv->mdev->dev, filter->reg_id);
+		if (rc && rc != -ENOENT)
+			en_err(priv, "Error detaching flow. rc = %d\n", rc);
+	}
+
+	rc = mlx4_flow_attach(priv->mdev->dev, &rule, &filter->reg_id);
+	if (rc)
+		en_err(priv, "Error attaching flow. err = %d\n", rc);
+
+	mlx4_en_filter_rfs_expire(priv);
+
+	filter->activated = 1;
+}
+
+static inline struct hlist_head *
+filter_hash_bucket(struct mlx4_en_priv *priv, __be32 src_ip, __be32 dst_ip,
+		   __be16 src_port, __be16 dst_port)
+{
+	unsigned long l;
+	int bucket_idx;
+
+	l = (__force unsigned long)src_port |
+	    ((__force unsigned long)dst_port << 2);
+	l ^= (__force unsigned long)(src_ip ^ dst_ip);
+
+	bucket_idx = hash_long(l, MLX4_EN_FILTER_HASH_SHIFT);
+
+	return &priv->filter_hash[bucket_idx];
+}
+
+static struct mlx4_en_filter *
+mlx4_en_filter_alloc(struct mlx4_en_priv *priv, int rxq_index, __be32 src_ip,
+		     __be32 dst_ip, __be16 src_port, __be16 dst_port,
+		     u32 flow_id)
+{
+	struct mlx4_en_filter *filter = NULL;
+
+	filter = kzalloc(sizeof(struct mlx4_en_filter), GFP_ATOMIC);
+	if (!filter)
+		return NULL;
+
+	filter->priv = priv;
+	filter->rxq_index = rxq_index;
+	INIT_WORK(&filter->work, mlx4_en_filter_work);
+
+	filter->src_ip = src_ip;
+	filter->dst_ip = dst_ip;
+	filter->src_port = src_port;
+	filter->dst_port = dst_port;
+
+	filter->flow_id = flow_id;
+
+	filter->id = priv->last_filter_id++;
+
+	list_add_tail(&filter->next, &priv->filters);
+	hlist_add_head(&filter->filter_chain,
+		       filter_hash_bucket(priv, src_ip, dst_ip, src_port,
+					  dst_port));
+
+	return filter;
+}
+
+static void mlx4_en_filter_free(struct mlx4_en_filter *filter)
+{
+	struct mlx4_en_priv *priv = filter->priv;
+	int rc;
+
+	list_del(&filter->next);
+
+	rc = mlx4_flow_detach(priv->mdev->dev, filter->reg_id);
+	if (rc && rc != -ENOENT)
+		en_err(priv, "Error detaching flow. rc = %d\n", rc);
+
+	kfree(filter);
+}
+
+static inline struct mlx4_en_filter *
+mlx4_en_filter_find(struct mlx4_en_priv *priv, __be32 src_ip, __be32 dst_ip,
+		    __be16 src_port, __be16 dst_port)
+{
+	struct hlist_node *elem;
+	struct mlx4_en_filter *filter;
+	struct mlx4_en_filter *ret = NULL;
+
+	hlist_for_each_entry(filter, elem,
+			     filter_hash_bucket(priv, src_ip, dst_ip,
+						src_port, dst_port),
+			     filter_chain) {
+		if (filter->src_ip == src_ip &&
+		    filter->dst_ip == dst_ip &&
+		    filter->src_port == src_port &&
+		    filter->dst_port == dst_port) {
+			ret = filter;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static int
+mlx4_en_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
+		   u16 rxq_index, u32 flow_id)
+{
+	struct mlx4_en_priv *priv = netdev_priv(net_dev);
+	struct mlx4_en_filter *filter;
+	const struct iphdr *ip;
+	const __be16 *ports;
+	__be32 src_ip;
+	__be32 dst_ip;
+	__be16 src_port;
+	__be16 dst_port;
+	int nhoff = skb_network_offset(skb);
+	int ret = 0;
+
+	if (skb->protocol != htons(ETH_P_IP))
+		return -EPROTONOSUPPORT;
+
+	ip = (const struct iphdr *)(skb->data + nhoff);
+	if (ip_is_fragment(ip))
+		return -EPROTONOSUPPORT;
+
+	ports = (const __be16 *)(skb->data + nhoff + 4 * ip->ihl);
+
+	src_ip = ip->saddr;
+	dst_ip = ip->daddr;
+	src_port = ports[0];
+	dst_port = ports[1];
+
+	if (ip->protocol != IPPROTO_TCP)
+		return -EPROTONOSUPPORT;
+
+	spin_lock_bh(&priv->filters_lock);
+	filter = mlx4_en_filter_find(priv, src_ip, dst_ip, src_port, dst_port);
+	if (filter) {
+		if (filter->rxq_index == rxq_index)
+			goto out;
+
+		filter->rxq_index = rxq_index;
+	} else {
+		filter = mlx4_en_filter_alloc(priv, rxq_index,
+					      src_ip, dst_ip,
+					      src_port, dst_port, flow_id);
+		if (!filter) {
+			ret = -ENOMEM;
+			goto err;
+		}
+	}
+
+	queue_work(priv->mdev->workqueue, &filter->work);
+
+out:
+	ret = filter->id;
+err:
+	spin_unlock_bh(&priv->filters_lock);
+
+	return ret;
+}
+
+void mlx4_en_cleanup_filters(struct mlx4_en_priv *priv,
+			     struct mlx4_en_rx_ring *rx_ring)
+{
+	struct mlx4_en_filter *filter, *tmp;
+	LIST_HEAD(del_list);
+
+	spin_lock_bh(&priv->filters_lock);
+	list_for_each_entry_safe(filter, tmp, &priv->filters, next) {
+		list_move(&filter->next, &del_list);
+		hlist_del(&filter->filter_chain);
+	}
+	spin_unlock_bh(&priv->filters_lock);
+
+	list_for_each_entry_safe(filter, tmp, &del_list, next) {
+		cancel_work_sync(&filter->work);
+		mlx4_en_filter_free(filter);
+	}
+}
+
+static void mlx4_en_filter_rfs_expire(struct mlx4_en_priv *priv)
+{
+	struct mlx4_en_filter *filter = NULL, *tmp, *last_filter = NULL;
+	LIST_HEAD(del_list);
+	int i = 0;
+
+	spin_lock_bh(&priv->filters_lock);
+	list_for_each_entry_safe(filter, tmp, &priv->filters, next) {
+		if (i > MLX4_EN_FILTER_EXPIRY_QUOTA)
+			break;
+
+		if (filter->activated &&
+		    !work_pending(&filter->work) &&
+		    rps_may_expire_flow(priv->dev,
+					filter->rxq_index, filter->flow_id,
+					filter->id)) {
+			list_move(&filter->next, &del_list);
+			hlist_del(&filter->filter_chain);
+		} else
+			last_filter = filter;
+
+		i++;
+	}
+
+	if (last_filter && (&last_filter->next != priv->filters.next))
+		list_move(&priv->filters, &last_filter->next);
+
+	spin_unlock_bh(&priv->filters_lock);
+
+	list_for_each_entry_safe(filter, tmp, &del_list, next)
+		mlx4_en_filter_free(filter);
+}
+#endif
+
 static int mlx4_en_vlan_rx_add_vid(struct net_device *dev, unsigned short vid)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -1079,6 +1374,11 @@ void mlx4_en_free_resources(struct mlx4_en_priv *priv)
 {
 	int i;
 
+#ifdef CONFIG_RFS_ACCEL
+	free_irq_cpu_rmap(priv->dev->rx_cpu_rmap);
+	priv->dev->rx_cpu_rmap = NULL;
+#endif
+
 	for (i = 0; i < priv->tx_ring_num; i++) {
 		if (priv->tx_ring[i].tx_info)
 			mlx4_en_destroy_tx_ring(priv, &priv->tx_ring[i]);
@@ -1134,6 +1434,15 @@ int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
 			goto err;
 	}
 
+#ifdef CONFIG_RFS_ACCEL
+	priv->dev->rx_cpu_rmap = alloc_irq_cpu_rmap(priv->rx_ring_num);
+	if (!priv->dev->rx_cpu_rmap)
+		goto err;
+
+	INIT_LIST_HEAD(&priv->filters);
+	spin_lock_init(&priv->filters_lock);
+#endif
+
 	return 0;
 
 err:
@@ -1241,6 +1550,9 @@ static const struct net_device_ops mlx4_netdev_ops = {
 #endif
 	.ndo_set_features	= mlx4_en_set_features,
 	.ndo_setup_tc		= mlx4_en_setup_tc,
+#ifdef CONFIG_RFS_ACCEL
+	.ndo_rx_flow_steer	= mlx4_en_filter_rfs,
+#endif
 };
 
 int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
@@ -1358,6 +1670,10 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 			NETIF_F_HW_VLAN_FILTER;
 	dev->hw_features |= NETIF_F_LOOPBACK;
 
+	if (mdev->dev->caps.steering_mode ==
+	    MLX4_STEERING_MODE_DEVICE_MANAGED)
+		dev->hw_features |= NETIF_F_NTUPLE;
+
 	mdev->pndev[port] = dev;
 
 	netif_carrier_off(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index a04cbf7..796cd58 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -389,6 +389,9 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
 	mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE);
 	vfree(ring->rx_info);
 	ring->rx_info = NULL;
+#ifdef CONFIG_RFS_ACCEL
+	mlx4_en_cleanup_filters(priv, ring);
+#endif
 }
 
 void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index a126321..af34c98 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -43,6 +43,7 @@
 #ifdef CONFIG_MLX4_EN_DCB
 #include <linux/dcbnl.h>
 #endif
+#include <linux/cpu_rmap.h>
 
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/qp.h>
@@ -77,6 +78,9 @@
 #define STATS_DELAY		(HZ / 4)
 #define MAX_NUM_OF_FS_RULES	256
 
+#define MLX4_EN_FILTER_HASH_SHIFT 4
+#define MLX4_EN_FILTER_EXPIRY_QUOTA 60
+
 /* Typical TSO descriptor with 16 gather entries is 352 bytes... */
 #define MAX_DESC_SIZE		512
 #define MAX_DESC_TXBBS		(MAX_DESC_SIZE / TXBB_SIZE)
@@ -523,6 +527,13 @@ struct mlx4_en_priv {
 	struct ieee_ets ets;
 	u16 maxrate[IEEE_8021QAZ_MAX_TCS];
 #endif
+#ifdef CONFIG_RFS_ACCEL
+	spinlock_t filters_lock;
+	int last_filter_id;
+	struct list_head filters;
+	struct hlist_head filter_hash[1 << MLX4_EN_FILTER_HASH_SHIFT];
+#endif
+
 };
 
 enum mlx4_en_wol {
@@ -602,6 +613,11 @@ int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port);
 extern const struct dcbnl_rtnl_ops mlx4_en_dcbnl_ops;
 #endif
 
+#ifdef CONFIG_RFS_ACCEL
+void mlx4_en_cleanup_filters(struct mlx4_en_priv *priv,
+			     struct mlx4_en_rx_ring *rx_ring);
+#endif
+
 #define MLX4_EN_NUM_SELF_TEST	5
 void mlx4_en_ex_selftest(struct net_device *dev, u32 *flags, u64 *buf);
 u64 mlx4_en_mac_to_u64(u8 *addr);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support
  2012-07-18 14:19 [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
                   ` (3 preceding siblings ...)
  2012-07-18 14:19 ` [PATCH net-next 4/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
@ 2012-07-18 18:41 ` David Miller
  2012-07-18 21:22   ` Or Gerlitz
  4 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2012-07-18 18:41 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, netdev, oren, yevgenyp, amirv

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 18 Jul 2012 17:19:18 +0300

> This series from Amir Vadai adds support for Accelerated RFS 
> to the mlx4_en Ethernet driver.
> 
> The code uses the Accelerated RFS infrastructure and HW flow steering 
> to keep CPU affinity of rx interrupts and applications per TCP stream.
> 
> To do so, we had to add little protection to cpu_rmap.h against double 
> inclusion. Also, added linking between CPU to IRQ using rmap in the 
> mlx4_core driver.

Please use CONFIG_RFS_ACCEL consistently to protect this feature
in your driver sources.

Using CPU_RMAP in a few places is inconsistent, and not what other
drivers do.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support
  2012-07-18 18:41 ` [PATCH net-next 0/4] " David Miller
@ 2012-07-18 21:22   ` Or Gerlitz
  2012-07-18 21:34     ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2012-07-18 21:22 UTC (permalink / raw)
  To: David Miller; +Cc: roland, netdev, oren, yevgenyp, amirv

On Wed, Jul 18, 2012 at 9:41 PM, David Miller <davem@davemloft.net> wrote:

> Please use CONFIG_RFS_ACCEL consistently to protect this feature
> in your driver sources.
>
> Using CPU_RMAP in a few places is inconsistent, and not what other
> drivers do.

We're indeed using CONFIG_RFS_ACCEL across the place, and only in two
places which deal directly with rmap use CPU_RMAP. The latter is
selected by RFS_ACCEL, but if you feel this is inconsistent, will
change to use only one define.

Or.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support
  2012-07-18 21:22   ` Or Gerlitz
@ 2012-07-18 21:34     ` David Miller
  2012-07-18 21:36       ` Or Gerlitz
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2012-07-18 21:34 UTC (permalink / raw)
  To: or.gerlitz; +Cc: roland, netdev, oren, yevgenyp, amirv

From: Or Gerlitz <or.gerlitz@gmail.com>
Date: Thu, 19 Jul 2012 00:22:42 +0300

> On Wed, Jul 18, 2012 at 9:41 PM, David Miller <davem@davemloft.net> wrote:
> 
>> Please use CONFIG_RFS_ACCEL consistently to protect this feature
>> in your driver sources.
>>
>> Using CPU_RMAP in a few places is inconsistent, and not what other
>> drivers do.
> 
> We're indeed using CONFIG_RFS_ACCEL across the place, and only in two
> places which deal directly with rmap use CPU_RMAP. The latter is
> selected by RFS_ACCEL, but if you feel this is inconsistent, will
> change to use only one define.

That's basically what I said.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support
  2012-07-18 21:34     ` David Miller
@ 2012-07-18 21:36       ` Or Gerlitz
  2012-07-18 21:37         ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Or Gerlitz @ 2012-07-18 21:36 UTC (permalink / raw)
  To: David Miller; +Cc: roland, netdev, oren, yevgenyp, amirv

On Thu, Jul 19, 2012 at 12:34 AM, David Miller <davem@davemloft.net> wrote:

> That's basically what I said.

sure, will follow. Any further comments that we should fix in this series?

Or.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support
  2012-07-18 21:36       ` Or Gerlitz
@ 2012-07-18 21:37         ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2012-07-18 21:37 UTC (permalink / raw)
  To: or.gerlitz; +Cc: roland, netdev, oren, yevgenyp, amirv

From: Or Gerlitz <or.gerlitz@gmail.com>
Date: Thu, 19 Jul 2012 00:36:12 +0300

> On Thu, Jul 19, 2012 at 12:34 AM, David Miller <davem@davemloft.net> wrote:
> 
>> That's basically what I said.
> 
> sure, will follow. Any further comments that we should fix in this series?

Nope, it otherwise looked good to me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-07-18 21:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-18 14:19 [PATCH net-next 0/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
2012-07-18 14:19 ` [PATCH net-next 1/4] net/mlx4: Move MAC_MASK to a common place Or Gerlitz
2012-07-18 14:19 ` [PATCH net-next 2/4] net/rps: Protect cpu_rmap.h from double inclusion Or Gerlitz
2012-07-18 14:19 ` [PATCH net-next 3/4] {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq Or Gerlitz
2012-07-18 14:19 ` [PATCH net-next 4/4] net/mlx4_en: Add accelerated RFS support Or Gerlitz
2012-07-18 18:41 ` [PATCH net-next 0/4] " David Miller
2012-07-18 21:22   ` Or Gerlitz
2012-07-18 21:34     ` David Miller
2012-07-18 21:36       ` Or Gerlitz
2012-07-18 21:37         ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.