All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] enic updates
@ 2014-06-09 18:32 Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect() Govindarajulu Varadarajan
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-09 18:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, gvaradar, benve, _govind

This series fixes minor bugs and adds new features like Accelerated RFS,
busy_poll, tx clean-up in napi_poll.

Please apply it to net-next.

Govindarajulu Varadarajan (7):
  flow_keys: Record IP layer protocol in skb_flow_dissect()
  enic: fix return value in _vnic_dev_cmd
  enic: devcmd for adding IP 5 tuple hardware filters
  enic: alloc/free rx_cpu_rmap
  enic: Add Accelerated RFS support
  enic: add low latency socket busy_poll support
  enic: do tx cleanup in napi poll

Tony Camuso (1):
  enic: fix lockdep around devcmd_lock

 drivers/net/ethernet/cisco/enic/Makefile      |   2 +-
 drivers/net/ethernet/cisco/enic/enic.h        |  43 +++-
 drivers/net/ethernet/cisco/enic/enic_api.c    |   4 +-
 drivers/net/ethernet/cisco/enic/enic_clsf.c   | 269 ++++++++++++++++++++++++++
 drivers/net/ethernet/cisco/enic/enic_clsf.h   |  19 ++
 drivers/net/ethernet/cisco/enic/enic_dev.c    |  80 ++++----
 drivers/net/ethernet/cisco/enic/enic_dev.h    |   4 +-
 drivers/net/ethernet/cisco/enic/enic_main.c   | 239 +++++++++++++++++------
 drivers/net/ethernet/cisco/enic/enic_res.c    |   1 +
 drivers/net/ethernet/cisco/enic/vnic_dev.c    |  65 ++++++-
 drivers/net/ethernet/cisco/enic/vnic_dev.h    |   2 +
 drivers/net/ethernet/cisco/enic/vnic_devcmd.h |   5 +
 drivers/net/ethernet/cisco/enic/vnic_enet.h   |   2 +
 drivers/net/ethernet/cisco/enic/vnic_rq.h     | 122 ++++++++++++
 include/net/flow_keys.h                       |  14 ++
 include/net/sch_generic.h                     |   2 +-
 net/core/flow_dissector.c                     |   1 +
 17 files changed, 771 insertions(+), 103 deletions(-)
 create mode 100644 drivers/net/ethernet/cisco/enic/enic_clsf.c
 create mode 100644 drivers/net/ethernet/cisco/enic/enic_clsf.h

-- 
2.0.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-09 18:32 [PATCH net-next 0/8] enic updates Govindarajulu Varadarajan
@ 2014-06-09 18:32 ` Govindarajulu Varadarajan
  2014-06-10 13:11   ` Daniel Borkmann
                     ` (2 more replies)
  2014-06-09 18:32 ` [PATCH net-next 2/8] enic: fix return value in _vnic_dev_cmd Govindarajulu Varadarajan
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-09 18:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, gvaradar, benve, _govind

skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
give any information about IPv4 or IPv6.

This patch adds new member, n_proto, to struct flow_keys. Which records the
IP layer type. i.e IPv4 or IPv6.

This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.

Adding new member to flow_keys increases the struct size by around 4 bytes.
This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
qdisc_cb_private_validate()

So increase data size by 4

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 include/net/flow_keys.h   | 14 ++++++++++++++
 include/net/sch_generic.h |  2 +-
 net/core/flow_dissector.c |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index 7e64bd8..fbefdca 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -1,6 +1,19 @@
 #ifndef _NET_FLOW_KEYS_H
 #define _NET_FLOW_KEYS_H
 
+/* struct flow_keys:
+ *	@src: source ip address in case of IPv4
+ *	      For IPv6 it contains 32bit hash of src address
+ *	@dst: destination ip address in case of IPv4
+ *	      For IPv6 it contains 32bit hash of dst address
+ *	@ports: port numbers of Transport header
+ *		port16[0]: src port number
+ *		port16[1]: dst port number
+ *	@thoff: Transport header offset
+ *	@n_proto: Network header protocol (eg. IPv4/IPv6)
+ *	@ip_proto: Transport header protocol (eg. TCP/UDP)
+ * All the members, except thoff, are in network byte order.
+ */
 struct flow_keys {
 	/* (src,dst) must be grouped, in the same way than in IP header */
 	__be32 src;
@@ -10,6 +23,7 @@ struct flow_keys {
 		__be16 port16[2];
 	};
 	u16 thoff;
+	u16 n_proto;
 	u8 ip_proto;
 };
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 624f985..a3cfb8e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -231,7 +231,7 @@ struct qdisc_skb_cb {
 	unsigned int		pkt_len;
 	u16			slave_dev_queue_mapping;
 	u16			_pad;
-	unsigned char		data[20];
+	unsigned char		data[24];
 };
 
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 107ed12..c2b53c1 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -175,6 +175,7 @@ ipv6:
 		break;
 	}
 
+	flow->n_proto = proto;
 	flow->ip_proto = ip_proto;
 	flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
 	flow->thoff = (u16) nhoff;
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 2/8] enic: fix return value in _vnic_dev_cmd
  2014-06-09 18:32 [PATCH net-next 0/8] enic updates Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect() Govindarajulu Varadarajan
@ 2014-06-09 18:32 ` Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 3/8] enic: devcmd for adding IP 5 tuple hardware filters Govindarajulu Varadarajan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-09 18:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, gvaradar, benve, _govind

Hardware (in readq(&devcmd->args[0])) returns positive number in case of error.
But _vnic_dev_cmd should return a negative value in case of error.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/vnic_dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c b/drivers/net/ethernet/cisco/enic/vnic_dev.c
index e86a45c..263081b 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c
@@ -312,12 +312,12 @@ static int _vnic_dev_cmd(struct vnic_dev *vdev, enum vnic_devcmd_cmd cmd,
 				err = (int)readq(&devcmd->args[0]);
 				if (err == ERR_EINVAL &&
 				    cmd == CMD_CAPABILITY)
-					return err;
+					return -err;
 				if (err != ERR_ECMDUNKNOWN ||
 				    cmd != CMD_CAPABILITY)
 					pr_err("Error %d devcmd %d\n",
 						err, _CMD_N(cmd));
-				return err;
+				return -err;
 			}
 
 			if (_CMD_DIR(cmd) & _CMD_DIR_READ) {
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 3/8] enic: devcmd for adding IP 5 tuple hardware filters
  2014-06-09 18:32 [PATCH net-next 0/8] enic updates Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect() Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 2/8] enic: fix return value in _vnic_dev_cmd Govindarajulu Varadarajan
@ 2014-06-09 18:32 ` Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 4/8] enic: alloc/free rx_cpu_rmap Govindarajulu Varadarajan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-09 18:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, gvaradar, benve, _govind

This patch adds interface to add and delete IP 5 tuple filter. This interface
is used by Accelerated RFS code to steer a flow to corresponding receive
queue.

As of now adaptor supports only ipv4 + tcp/udp packet steering.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/Makefile      |  2 +-
 drivers/net/ethernet/cisco/enic/enic_clsf.c   | 66 +++++++++++++++++++++++++++
 drivers/net/ethernet/cisco/enic/enic_clsf.h   | 10 ++++
 drivers/net/ethernet/cisco/enic/vnic_dev.c    | 61 +++++++++++++++++++++++++
 drivers/net/ethernet/cisco/enic/vnic_dev.h    |  2 +
 drivers/net/ethernet/cisco/enic/vnic_devcmd.h |  5 ++
 6 files changed, 145 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/cisco/enic/enic_clsf.c
 create mode 100644 drivers/net/ethernet/cisco/enic/enic_clsf.h

diff --git a/drivers/net/ethernet/cisco/enic/Makefile b/drivers/net/ethernet/cisco/enic/Makefile
index 239e1e4..aadcaf7 100644
--- a/drivers/net/ethernet/cisco/enic/Makefile
+++ b/drivers/net/ethernet/cisco/enic/Makefile
@@ -2,5 +2,5 @@ obj-$(CONFIG_ENIC) := enic.o
 
 enic-y := enic_main.o vnic_cq.o vnic_intr.o vnic_wq.o \
 	enic_res.o enic_dev.o enic_pp.o vnic_dev.o vnic_rq.o vnic_vic.o \
-	enic_ethtool.o enic_api.o
+	enic_ethtool.o enic_api.o enic_clsf.o
 
diff --git a/drivers/net/ethernet/cisco/enic/enic_clsf.c b/drivers/net/ethernet/cisco/enic/enic_clsf.c
new file mode 100644
index 0000000..f6703c4
--- /dev/null
+++ b/drivers/net/ethernet/cisco/enic/enic_clsf.c
@@ -0,0 +1,66 @@
+#include <linux/if.h>
+#include <linux/if_ether.h>
+#include <linux/if_link.h>
+#include <linux/netdevice.h>
+#include <linux/in.h>
+#include <linux/types.h>
+#include <linux/skbuff.h>
+#include <net/flow_keys.h>
+#include "enic_res.h"
+#include "enic_clsf.h"
+
+/* enic_addfltr_5t - Add ipv4 5tuple filter
+ *	@enic: enic struct of vnic
+ *	@keys: flow_keys of ipv4 5tuple
+ *	@rq: rq number to steer to
+ *
+ * This function returns filter_id(hardware_id) of the filter
+ * added. In case of error it returns an negative number.
+ */
+int enic_addfltr_5t(struct enic *enic, struct flow_keys *keys, u16 rq)
+{
+	int res;
+	struct filter data;
+
+	switch (keys->ip_proto) {
+	case IPPROTO_TCP:
+		data.u.ipv4.protocol = PROTO_TCP;
+		break;
+	case IPPROTO_UDP:
+		data.u.ipv4.protocol = PROTO_UDP;
+		break;
+	default:
+		return -EPROTONOSUPPORT;
+	};
+	data.type = FILTER_IPV4_5TUPLE;
+	data.u.ipv4.src_addr = ntohl(keys->src);
+	data.u.ipv4.dst_addr = ntohl(keys->dst);
+	data.u.ipv4.src_port = ntohs(keys->port16[0]);
+	data.u.ipv4.dst_port = ntohs(keys->port16[1]);
+	data.u.ipv4.flags = FILTER_FIELDS_IPV4_5TUPLE;
+
+	spin_lock_bh(&enic->devcmd_lock);
+	res = vnic_dev_classifier(enic->vdev, CLSF_ADD, &rq, &data);
+	spin_unlock_bh(&enic->devcmd_lock);
+	res = (res == 0) ? rq : res;
+
+	return res;
+}
+
+/* enic_delfltr - Delete clsf filter
+ *	@enic: enic struct of vnic
+ *	@filter_id: filter_is(hardware_id) of filter to be deleted
+ *
+ * This function returns zero in case of success, negative number incase of
+ * error.
+ */
+int enic_delfltr(struct enic *enic, u16 filter_id)
+{
+	int ret;
+
+	spin_lock_bh(&enic->devcmd_lock);
+	ret = vnic_dev_classifier(enic->vdev, CLSF_DEL, &filter_id, NULL);
+	spin_unlock_bh(&enic->devcmd_lock);
+
+	return ret;
+}
diff --git a/drivers/net/ethernet/cisco/enic/enic_clsf.h b/drivers/net/ethernet/cisco/enic/enic_clsf.h
new file mode 100644
index 0000000..b6925b3
--- /dev/null
+++ b/drivers/net/ethernet/cisco/enic/enic_clsf.h
@@ -0,0 +1,10 @@
+#ifndef _ENIC_CLSF_H_
+#define _ENIC_CLSF_H_
+
+#include "vnic_dev.h"
+#include "enic.h"
+
+int enic_addfltr_5t(struct enic *enic, struct flow_keys *keys, u16 rq);
+int enic_delfltr(struct enic *enic, u16 filter_id);
+
+#endif /* _ENIC_CLSF_H_ */
diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c b/drivers/net/ethernet/cisco/enic/vnic_dev.c
index 263081b..5abc496 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c
@@ -1048,3 +1048,64 @@ int vnic_dev_set_mac_addr(struct vnic_dev *vdev, u8 *mac_addr)
 
 	return vnic_dev_cmd(vdev, CMD_SET_MAC_ADDR, &a0, &a1, wait);
 }
+
+/* vnic_dev_classifier: Add/Delete classifier entries
+ * @vdev: vdev of the device
+ * @cmd: CLSF_ADD for Add filter
+ *	 CLSF_DEL for Delete filter
+ * @entry: In case of ADD filter, the caller passes the RQ number in this
+ *	   variable.
+ *
+ *	   This function stores the filter_id returned by the firmware in the
+ *	   same variable before return;
+ *
+ *	   In case of DEL filter, the caller passes the RQ number. Return
+ *	   value is irrelevant.
+ * @data: filter data
+ */
+int vnic_dev_classifier(struct vnic_dev *vdev, u8 cmd, u16 *entry,
+			struct filter *data)
+{
+	u64 a0, a1;
+	int wait = 1000;
+	dma_addr_t tlv_pa;
+	int ret = -EINVAL;
+	struct filter_tlv *tlv, *tlv_va;
+	struct filter_action *action;
+	u64 tlv_size;
+
+	if (cmd == CLSF_ADD) {
+		tlv_size = sizeof(struct filter) +
+			   sizeof(struct filter_action) +
+			   2 * sizeof(struct filter_tlv);
+		tlv_va = pci_alloc_consistent(vdev->pdev, tlv_size, &tlv_pa);
+		if (!tlv_va)
+			return -ENOMEM;
+		tlv = tlv_va;
+		a0 = tlv_pa;
+		a1 = tlv_size;
+		memset(tlv, 0, tlv_size);
+		tlv->type = CLSF_TLV_FILTER;
+		tlv->length = sizeof(struct filter);
+		*(struct filter *)&tlv->val = *data;
+
+		tlv = (struct filter_tlv *)((char *)tlv +
+					    sizeof(struct filter_tlv) +
+					    sizeof(struct filter));
+
+		tlv->type = CLSF_TLV_ACTION;
+		tlv->length = sizeof(struct filter_action);
+		action = (struct filter_action *)&tlv->val;
+		action->type = FILTER_ACTION_RQ_STEERING;
+		action->u.rq_idx = *entry;
+
+		ret = vnic_dev_cmd(vdev, CMD_ADD_FILTER, &a0, &a1, wait);
+		*entry = (u16)a0;
+		pci_free_consistent(vdev->pdev, tlv_size, tlv_va, tlv_pa);
+	} else if (cmd == CLSF_DEL) {
+		a0 = *entry;
+		ret = vnic_dev_cmd(vdev, CMD_DEL_FILTER, &a0, &a1, wait);
+	}
+
+	return ret;
+}
diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.h b/drivers/net/ethernet/cisco/enic/vnic_dev.h
index 1f3b301..1fb214e 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.h
@@ -133,5 +133,7 @@ int vnic_dev_enable2(struct vnic_dev *vdev, int active);
 int vnic_dev_enable2_done(struct vnic_dev *vdev, int *status);
 int vnic_dev_deinit_done(struct vnic_dev *vdev, int *status);
 int vnic_dev_set_mac_addr(struct vnic_dev *vdev, u8 *mac_addr);
+int vnic_dev_classifier(struct vnic_dev *vdev, u8 cmd, u16 *entry,
+			struct filter *data);
 
 #endif /* _VNIC_DEV_H_ */
diff --git a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
index b9a0d78..435d0cd 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_devcmd.h
@@ -603,6 +603,11 @@ struct filter_tlv {
 	u_int32_t val[0];
 };
 
+enum {
+	CLSF_ADD = 0,
+	CLSF_DEL = 1,
+};
+
 /*
  * Writing cmd register causes STAT_BUSY to get set in status register.
  * When cmd completes, STAT_BUSY will be cleared.
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 4/8] enic: alloc/free rx_cpu_rmap
  2014-06-09 18:32 [PATCH net-next 0/8] enic updates Govindarajulu Varadarajan
                   ` (2 preceding siblings ...)
  2014-06-09 18:32 ` [PATCH net-next 3/8] enic: devcmd for adding IP 5 tuple hardware filters Govindarajulu Varadarajan
@ 2014-06-09 18:32 ` Govindarajulu Varadarajan
  2014-06-09 18:44   ` Sergei Shtylyov
  2014-06-09 18:32 ` [PATCH net-next 5/8] enic: Add Accelerated RFS support Govindarajulu Varadarajan
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-09 18:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, gvaradar, benve, _govind

rx_cpu_rmap provides the reverse irq cpu affinity. This patch allocates and
sets drivers netdev->rx_cpu_rmap accordingly.

rx_cpu_rmap is set in enic_request_intr() which is called by enic_open and
rx_cpu_rmap is freed in enic_free_intr() which is called by enic_stop.

This is used by Accelerated RFS.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 36 +++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index f32f828..f4508d9 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -39,6 +39,9 @@
 #include <linux/prefetch.h>
 #include <net/ip6_checksum.h>
 #include <linux/ktime.h>
+#ifdef CONFIG_RFS_ACCEL
+#include <linux/cpu_rmap.h>
+#endif
 
 #include "cq_enet_desc.h"
 #include "vnic_dev.h"
@@ -1192,6 +1195,33 @@ static void enic_calc_int_moderation(struct enic *enic, struct vnic_rq *rq)
 	pkt_size_counter->small_pkt_bytes_cnt = 0;
 }
 
+#ifdef CONFIG_RFS_ACCEL
+static void enic_free_rx_cpu_rmap(struct enic *enic)
+{
+	free_irq_cpu_rmap(enic->netdev->rx_cpu_rmap);
+	enic->netdev->rx_cpu_rmap = NULL;
+}
+
+static inline void enic_set_rx_cpu_rmap(struct enic *enic)
+{
+	int i, res;
+
+	if (vnic_dev_get_intr_mode(enic->vdev) == VNIC_DEV_INTR_MODE_MSIX) {
+		enic->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(enic->rq_count);
+		if (unlikely(!enic->netdev->rx_cpu_rmap))
+			return;
+		for (i = 0; i < enic->rq_count; i++) {
+			res = irq_cpu_rmap_add(enic->netdev->rx_cpu_rmap,
+					       enic->msix_entry[i].vector);
+			if (unlikely(res)) {
+				enic_free_rx_cpu_rmap(enic);
+				return;
+			}
+		}
+	}
+}
+#endif
+
 static int enic_poll_msix(struct napi_struct *napi, int budget)
 {
 	struct net_device *netdev = napi->dev;
@@ -1267,6 +1297,9 @@ static void enic_free_intr(struct enic *enic)
 	struct net_device *netdev = enic->netdev;
 	unsigned int i;
 
+#ifdef CONFIG_RFS_ACCEL
+	enic_free_rx_cpu_rmap(enic);
+#endif
 	switch (vnic_dev_get_intr_mode(enic->vdev)) {
 	case VNIC_DEV_INTR_MODE_INTX:
 		free_irq(enic->pdev->irq, netdev);
@@ -1291,6 +1324,9 @@ static int enic_request_intr(struct enic *enic)
 	unsigned int i, intr;
 	int err = 0;
 
+#ifdef CONFIG_RFS_ACCEL
+	enic_set_rx_cpu_rmap(enic);
+#endif
 	switch (vnic_dev_get_intr_mode(enic->vdev)) {
 
 	case VNIC_DEV_INTR_MODE_INTX:
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 5/8] enic: Add Accelerated RFS support
  2014-06-09 18:32 [PATCH net-next 0/8] enic updates Govindarajulu Varadarajan
                   ` (3 preceding siblings ...)
  2014-06-09 18:32 ` [PATCH net-next 4/8] enic: alloc/free rx_cpu_rmap Govindarajulu Varadarajan
@ 2014-06-09 18:32 ` Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 6/8] enic: fix lockdep around devcmd_lock Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 7/8] enic: add low latency socket busy_poll support Govindarajulu Varadarajan
  6 siblings, 0 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-09 18:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, gvaradar, benve, _govind

This patch adds supports for Accelerated Receive Flow Steering.

When the desired rx is different from current rq, for a flow, kernel calls the
driver function enic_rx_flow_steer(). enic_rx_flow_steer adds a IP-TCP/UDP
hardware filter.

Driver registers a timer function enic_flow_may_expire. This function is called
every HZ/4 seconds. In this function we check if the added filter has expired
by calling rps_may_expire_flow(). If the flow has expired, it removes the hw
filter.

As of now adaptor supports only IPv4 - TCP/UDP filters.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic.h      |  41 ++++++
 drivers/net/ethernet/cisco/enic/enic_clsf.c | 203 ++++++++++++++++++++++++++++
 drivers/net/ethernet/cisco/enic/enic_clsf.h |   9 ++
 drivers/net/ethernet/cisco/enic/enic_main.c |  17 +++
 drivers/net/ethernet/cisco/enic/enic_res.c  |   1 +
 drivers/net/ethernet/cisco/enic/vnic_enet.h |   2 +
 6 files changed, 273 insertions(+)

diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 14f465f..b9b9178 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -99,6 +99,44 @@ struct enic_port_profile {
 	u8 mac_addr[ETH_ALEN];
 };
 
+#ifdef CONFIG_RFS_ACCEL
+/* enic_rfs_fltr_node - rfs filter node in hash table
+ *	@@keys: IPv4 5 tuple
+ *	@flow_id: flow_id of clsf filter provided by kernel
+ *	@fltr_id: filter id of clsf filter returned by adaptor
+ *	@rq_id: desired rq index
+ *	@node: hlist_node
+ */
+struct enic_rfs_fltr_node {
+	struct flow_keys keys;
+	u32 flow_id;
+	u16 fltr_id;
+	u16 rq_id;
+	struct hlist_node node;
+};
+
+/* enic_rfs_flw_tbl - rfs flow table
+ *	@max: Maximum number of filters vNIC supports
+ *	@free: Number of free filters available
+ *	@toclean: hash table index to clean next
+ *	@ht_head: hash table list head
+ *	@lock: spin lock
+ *	@rfs_may_expire: timer function for enic_rps_may_expire_flow
+ */
+struct enic_rfs_flw_tbl {
+	u16 max;
+	int free;
+
+#define ENIC_RFS_FLW_BITSHIFT	(10)
+#define ENIC_RFS_FLW_MASK	((1 << ENIC_RFS_FLW_BITSHIFT) - 1)
+	u16 toclean:ENIC_RFS_FLW_BITSHIFT;
+	struct hlist_head ht_head[1 << ENIC_RFS_FLW_BITSHIFT];
+	spinlock_t lock;
+	struct timer_list rfs_may_expire;
+};
+
+#endif /* CONFIG_RFS_ACCEL */
+
 /* Per-instance private data structure */
 struct enic {
 	struct net_device *netdev;
@@ -150,6 +188,9 @@ struct enic {
 	/* completion queue cache line section */
 	____cacheline_aligned struct vnic_cq cq[ENIC_CQ_MAX];
 	unsigned int cq_count;
+#ifdef CONFIG_RFS_ACCEL
+	struct enic_rfs_flw_tbl rfs_h;
+#endif
 };
 
 static inline struct device *enic_get_dev(struct enic *enic)
diff --git a/drivers/net/ethernet/cisco/enic/enic_clsf.c b/drivers/net/ethernet/cisco/enic/enic_clsf.c
index f6703c4..1cabcfc 100644
--- a/drivers/net/ethernet/cisco/enic/enic_clsf.c
+++ b/drivers/net/ethernet/cisco/enic/enic_clsf.c
@@ -64,3 +64,206 @@ int enic_delfltr(struct enic *enic, u16 filter_id)
 
 	return ret;
 }
+
+#ifdef CONFIG_RFS_ACCEL
+void enic_flow_may_expire(unsigned long data)
+{
+	struct enic *enic = (struct enic *) data;
+	bool res;
+	int j;
+
+	spin_lock(&enic->rfs_h.lock);
+	for (j = 0; j < ENIC_CLSF_EXPIRE_COUNT; j++) {
+		struct hlist_head *hhead;
+		struct hlist_node *tmp;
+		struct enic_rfs_fltr_node *n;
+
+		hhead = &enic->rfs_h.ht_head[enic->rfs_h.toclean++];
+		hlist_for_each_entry_safe(n, tmp, hhead, node) {
+			res = rps_may_expire_flow(enic->netdev, n->rq_id,
+						  n->flow_id, n->fltr_id);
+			if (res) {
+				res = enic_delfltr(enic, n->fltr_id);
+				if (unlikely(res))
+					continue;
+				hlist_del(&n->node);
+				kfree(n);
+				enic->rfs_h.free++;
+			}
+		}
+	}
+	spin_unlock(&enic->rfs_h.lock);
+	mod_timer(&enic->rfs_h.rfs_may_expire, jiffies + HZ/4);
+}
+
+/* enic_rfs_flw_tbl_init - initialize enic->rfs_h members
+ *	@enic: enic data
+ */
+void enic_rfs_flw_tbl_init(struct enic *enic)
+{
+	int i;
+
+	spin_lock_init(&enic->rfs_h.lock);
+	for (i = 0; i <= ENIC_RFS_FLW_MASK; i++)
+		INIT_HLIST_HEAD(&enic->rfs_h.ht_head[i]);
+	enic->rfs_h.max = enic->config.num_arfs;
+	enic->rfs_h.free = enic->rfs_h.max;
+	enic->rfs_h.toclean = 0;
+	init_timer(&enic->rfs_h.rfs_may_expire);
+	enic->rfs_h.rfs_may_expire.function = enic_flow_may_expire;
+	enic->rfs_h.rfs_may_expire.data = (unsigned long)enic;
+	mod_timer(&enic->rfs_h.rfs_may_expire, jiffies + HZ/4);
+}
+
+void enic_rfs_flw_tbl_free(struct enic *enic)
+{
+	int i, res;
+
+	del_timer_sync(&enic->rfs_h.rfs_may_expire);
+	spin_lock(&enic->rfs_h.lock);
+	enic->rfs_h.free = 0;
+	for (i = 0; i < (1 << ENIC_RFS_FLW_BITSHIFT); i++) {
+		struct hlist_head *hhead;
+		struct hlist_node *tmp;
+		struct enic_rfs_fltr_node *n;
+
+		hhead = &enic->rfs_h.ht_head[i];
+		hlist_for_each_entry_safe(n, tmp, hhead, node) {
+			enic_delfltr(enic, n->fltr_id);
+			hlist_del(&n->node);
+			kfree(n);
+		}
+	}
+	spin_unlock(&enic->rfs_h.lock);
+}
+
+static struct enic_rfs_fltr_node *htbl_key_search(struct hlist_head *h,
+						  struct flow_keys *k)
+{
+	struct enic_rfs_fltr_node *tpos;
+
+	hlist_for_each_entry(tpos, h, node)
+		if (tpos->keys.src == k->src &&
+		    tpos->keys.dst == k->dst &&
+		    tpos->keys.ports == k->ports &&
+		    tpos->keys.ip_proto == k->ip_proto &&
+		    tpos->keys.n_proto == k->n_proto)
+			return tpos;
+	return NULL;
+}
+
+int enic_rx_flow_steer(struct net_device *dev, const struct sk_buff *skb,
+		       u16 rxq_index, u32 flow_id)
+{
+	struct flow_keys keys;
+	struct enic_rfs_fltr_node *n;
+	struct enic *enic;
+	u16 tbl_idx;
+	int res, i;
+
+	enic = netdev_priv(dev);
+	res = skb_flow_dissect(skb, &keys);
+	if (!res || keys.n_proto != htons(ETH_P_IP) ||
+	    (keys.ip_proto != IPPROTO_TCP && keys.ip_proto != IPPROTO_UDP))
+		return -EPROTONOSUPPORT;
+
+	tbl_idx = skb_get_hash_raw(skb) & ENIC_RFS_FLW_MASK;
+	spin_lock(&enic->rfs_h.lock);
+	n = htbl_key_search(&enic->rfs_h.ht_head[tbl_idx], &keys);
+
+	if (n) { /* entry already present  */
+		if (rxq_index == n->rq_id) {
+			res = -EEXIST;
+			goto ret_unlock;
+		}
+
+		/* desired rq changed for the flow, we need to delete
+		 * old fltr and add new one
+		 *
+		 * The moment we delete the fltr, the upcoming pkts
+		 * are put it default rq based on rss. When we add
+		 * new filter, upcoming pkts are put in desired queue.
+		 * This could cause ooo pkts.
+		 *
+		 * Lets 1st try adding new fltr and then del old one.
+		 */
+		i = --enic->rfs_h.free;
+		/* clsf tbl is full, we have to del old fltr first*/
+		if (unlikely(i < 0)) {
+			enic->rfs_h.free++;
+			res = enic_delfltr(enic, n->fltr_id);
+			if (unlikely(res < 0))
+				goto ret_unlock;
+			res = enic_addfltr_5t(enic, &keys, rxq_index);
+			if (res < 0) {
+				hlist_del(&n->node);
+				enic->rfs_h.free++;
+				goto ret_unlock;
+			}
+		/* add new fltr 1st then del old fltr */
+		} else {
+			int ret;
+
+			res = enic_addfltr_5t(enic, &keys, rxq_index);
+			if (res < 0) {
+				enic->rfs_h.free++;
+				goto ret_unlock;
+			}
+			ret = enic_delfltr(enic, n->fltr_id);
+			/* deleting old fltr failed. Add old fltr to list.
+			 * enic_flow_may_expire() will try to delete it later.
+			 */
+			if (unlikely(ret < 0)) {
+				struct enic_rfs_fltr_node *d;
+				struct hlist_head *head;
+
+				head = &enic->rfs_h.ht_head[tbl_idx];
+				d = kmalloc(sizeof(*d), GFP_ATOMIC);
+				if (d) {
+					d->fltr_id = n->fltr_id;
+					INIT_HLIST_NODE(&d->node);
+					hlist_add_head(&d->node, head);
+				}
+			} else {
+				enic->rfs_h.free++;
+			}
+		}
+		n->rq_id = rxq_index;
+		n->fltr_id = res;
+		n->flow_id = flow_id;
+	/* entry not present */
+	} else {
+		i = --enic->rfs_h.free;
+		if (i <= 0) {
+			enic->rfs_h.free++;
+			res = -EBUSY;
+			goto ret_unlock;
+		}
+
+		n = kmalloc(sizeof(*n), GFP_ATOMIC);
+		if (!n) {
+			res = -ENOMEM;
+			enic->rfs_h.free++;
+			goto ret_unlock;
+		}
+
+		res = enic_addfltr_5t(enic, &keys, rxq_index);
+		if (res < 0) {
+			kfree(n);
+			enic->rfs_h.free++;
+			goto ret_unlock;
+		}
+		n->rq_id = rxq_index;
+		n->fltr_id = res;
+		n->flow_id = flow_id;
+		n->keys = keys;
+		INIT_HLIST_NODE(&n->node);
+		hlist_add_head(&n->node, &enic->rfs_h.ht_head[tbl_idx]);
+	}
+
+ret_unlock:
+	spin_unlock(&enic->rfs_h.lock);
+	return res;
+}
+
+#endif /* CONFIG_RFS_ACCEL */
diff --git a/drivers/net/ethernet/cisco/enic/enic_clsf.h b/drivers/net/ethernet/cisco/enic/enic_clsf.h
index b6925b3..76a85bb 100644
--- a/drivers/net/ethernet/cisco/enic/enic_clsf.h
+++ b/drivers/net/ethernet/cisco/enic/enic_clsf.h
@@ -4,7 +4,16 @@
 #include "vnic_dev.h"
 #include "enic.h"
 
+#define ENIC_CLSF_EXPIRE_COUNT 128
+
 int enic_addfltr_5t(struct enic *enic, struct flow_keys *keys, u16 rq);
 int enic_delfltr(struct enic *enic, u16 filter_id);
 
+#ifdef CONFIG_RFS_ACCEL
+void enic_rfs_flw_tbl_init(struct enic *enic);
+void enic_rfs_flw_tbl_free(struct enic *enic);
+int enic_rx_flow_steer(struct net_device *dev, const struct sk_buff *skb,
+		       u16 rxq_index, u32 flow_id);
+#endif /* CONFIG_RFS_ACCEL */
+
 #endif /* _ENIC_CLSF_H_ */
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index f4508d9..67a12a7 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -52,6 +52,7 @@
 #include "enic.h"
 #include "enic_dev.h"
 #include "enic_pp.h"
+#include "enic_clsf.h"
 
 #define ENIC_NOTIFY_TIMER_PERIOD	(2 * HZ)
 #define WQ_ENET_MAX_DESC_LEN		(1 << WQ_ENET_LEN_BITS)
@@ -1539,6 +1540,9 @@ static int enic_open(struct net_device *netdev)
 		vnic_intr_unmask(&enic->intr[i]);
 
 	enic_notify_timer_start(enic);
+#ifdef CONFIG_RFS_ACCEL
+	enic_rfs_flw_tbl_init(enic);
+#endif
 
 	return 0;
 
@@ -1565,6 +1569,9 @@ static int enic_stop(struct net_device *netdev)
 	enic_synchronize_irqs(enic);
 
 	del_timer_sync(&enic->notify_timer);
+#ifdef CONFIG_RFS_ACCEL
+	enic_rfs_flw_tbl_free(enic);
+#endif
 
 	enic_dev_disable(enic);
 
@@ -2057,6 +2064,9 @@ static const struct net_device_ops enic_netdev_dynamic_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= enic_poll_controller,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+	.ndo_rx_flow_steer	= enic_rx_flow_steer,
+#endif
 };
 
 static const struct net_device_ops enic_netdev_ops = {
@@ -2077,6 +2087,9 @@ static const struct net_device_ops enic_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= enic_poll_controller,
 #endif
+#ifdef CONFIG_RFS_ACCEL
+	.ndo_rx_flow_steer	= enic_rx_flow_steer,
+#endif
 };
 
 static void enic_dev_deinit(struct enic *enic)
@@ -2422,6 +2435,10 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	netdev->features |= netdev->hw_features;
 
+#ifdef CONFIG_RFS_ACCEL
+	netdev->hw_features |= NETIF_F_NTUPLE;
+#endif
+
 	if (using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
diff --git a/drivers/net/ethernet/cisco/enic/enic_res.c b/drivers/net/ethernet/cisco/enic/enic_res.c
index 31d6588..9c96911 100644
--- a/drivers/net/ethernet/cisco/enic/enic_res.c
+++ b/drivers/net/ethernet/cisco/enic/enic_res.c
@@ -71,6 +71,7 @@ int enic_get_vnic_config(struct enic *enic)
 	GET_CONFIG(intr_mode);
 	GET_CONFIG(intr_timer_usec);
 	GET_CONFIG(loop_tag);
+	GET_CONFIG(num_arfs);
 
 	c->wq_desc_count =
 		min_t(u32, ENIC_MAX_WQ_DESCS,
diff --git a/drivers/net/ethernet/cisco/enic/vnic_enet.h b/drivers/net/ethernet/cisco/enic/vnic_enet.h
index 6095428..75aced2 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_enet.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_enet.h
@@ -32,6 +32,8 @@ struct vnic_enet_config {
 	char devname[16];
 	u32 intr_timer_usec;
 	u16 loop_tag;
+	u16 vf_rq_count;
+	u16 num_arfs;
 };
 
 #define VENETF_TSO		0x1	/* TSO enabled */
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 6/8] enic: fix lockdep around devcmd_lock
  2014-06-09 18:32 [PATCH net-next 0/8] enic updates Govindarajulu Varadarajan
                   ` (4 preceding siblings ...)
  2014-06-09 18:32 ` [PATCH net-next 5/8] enic: Add Accelerated RFS support Govindarajulu Varadarajan
@ 2014-06-09 18:32 ` Govindarajulu Varadarajan
  2014-06-09 18:32 ` [PATCH net-next 7/8] enic: add low latency socket busy_poll support Govindarajulu Varadarajan
  6 siblings, 0 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-09 18:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, gvaradar, benve, _govind, Tony Camuso

From: Tony Camuso <tcamuso@redhat.com>

We were experiencing occasional "BUG: scheduling while atomic" splats
in our testing. Enabling DEBUG_SPINLOCK and DEBUG_LOCKDEP in the kernel
exposed a lockdep in the enic driver.

enic 0000:0b:00.0 eth2: Link UP
======================================================
[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
3.12.0-rc1.x86_64-dbg+ #2 Tainted: GF       W
------------------------------------------------------
NetworkManager/4209 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
(&(&enic->devcmd_lock)->rlock){+.+...}, at: [<ffffffffa026b7e4>] enic_dev_packet_filter+0x44/0x90 [enic]

The fix was to replace spin_lock with spin_lock_bh for the enic
devcmd_lock, so that soft irqs would be disabled while the lock
is held.

Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic_api.c  |  4 +-
 drivers/net/ethernet/cisco/enic/enic_dev.c  | 80 ++++++++++++++---------------
 drivers/net/ethernet/cisco/enic/enic_dev.h  |  4 +-
 drivers/net/ethernet/cisco/enic/enic_main.c | 16 +++---
 4 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_api.c b/drivers/net/ethernet/cisco/enic/enic_api.c
index e13efbd..b161f24 100644
--- a/drivers/net/ethernet/cisco/enic/enic_api.c
+++ b/drivers/net/ethernet/cisco/enic/enic_api.c
@@ -34,13 +34,13 @@ int enic_api_devcmd_proxy_by_index(struct net_device *netdev, int vf,
 	struct vnic_dev *vdev = enic->vdev;
 
 	spin_lock(&enic->enic_api_lock);
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 
 	vnic_dev_cmd_proxy_by_index_start(vdev, vf);
 	err = vnic_dev_cmd(vdev, cmd, a0, a1, wait);
 	vnic_dev_cmd_proxy_end(vdev);
 
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 	spin_unlock(&enic->enic_api_lock);
 
 	return err;
diff --git a/drivers/net/ethernet/cisco/enic/enic_dev.c b/drivers/net/ethernet/cisco/enic/enic_dev.c
index 3e27df5..87ddc44 100644
--- a/drivers/net/ethernet/cisco/enic/enic_dev.c
+++ b/drivers/net/ethernet/cisco/enic/enic_dev.c
@@ -29,9 +29,9 @@ int enic_dev_fw_info(struct enic *enic, struct vnic_devcmd_fw_info **fw_info)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_fw_info(enic->vdev, fw_info);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -40,9 +40,9 @@ int enic_dev_stats_dump(struct enic *enic, struct vnic_stats **vstats)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_stats_dump(enic->vdev, vstats);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -54,9 +54,9 @@ int enic_dev_add_station_addr(struct enic *enic)
 	if (!is_valid_ether_addr(enic->netdev->dev_addr))
 		return -EADDRNOTAVAIL;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_add_addr(enic->vdev, enic->netdev->dev_addr);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -68,9 +68,9 @@ int enic_dev_del_station_addr(struct enic *enic)
 	if (!is_valid_ether_addr(enic->netdev->dev_addr))
 		return -EADDRNOTAVAIL;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_del_addr(enic->vdev, enic->netdev->dev_addr);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -80,10 +80,10 @@ int enic_dev_packet_filter(struct enic *enic, int directed, int multicast,
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_packet_filter(enic->vdev, directed,
 		multicast, broadcast, promisc, allmulti);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -92,9 +92,9 @@ int enic_dev_add_addr(struct enic *enic, const u8 *addr)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_add_addr(enic->vdev, addr);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -103,9 +103,9 @@ int enic_dev_del_addr(struct enic *enic, const u8 *addr)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_del_addr(enic->vdev, addr);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -114,9 +114,9 @@ int enic_dev_notify_unset(struct enic *enic)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_notify_unset(enic->vdev);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -125,9 +125,9 @@ int enic_dev_hang_notify(struct enic *enic)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_hang_notify(enic->vdev);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -136,10 +136,10 @@ int enic_dev_set_ig_vlan_rewrite_mode(struct enic *enic)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_set_ig_vlan_rewrite_mode(enic->vdev,
 		IG_VLAN_REWRITE_MODE_PRIORITY_TAG_DEFAULT_VLAN);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -148,9 +148,9 @@ int enic_dev_enable(struct enic *enic)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_enable_wait(enic->vdev);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -159,9 +159,9 @@ int enic_dev_disable(struct enic *enic)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_disable(enic->vdev);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -170,9 +170,9 @@ int enic_dev_intr_coal_timer_info(struct enic *enic)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_intr_coal_timer_info(enic->vdev);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -181,9 +181,9 @@ int enic_vnic_dev_deinit(struct enic *enic)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_deinit(enic->vdev);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -192,10 +192,10 @@ int enic_dev_init_prov2(struct enic *enic, struct vic_provinfo *vp)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_init_prov2(enic->vdev,
 		(u8 *)vp, vic_provinfo_size(vp));
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -204,9 +204,9 @@ int enic_dev_deinit_done(struct enic *enic, int *status)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_deinit_done(enic->vdev, status);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -217,9 +217,9 @@ int enic_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, u16 vid)
 	struct enic *enic = netdev_priv(netdev);
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = enic_add_vlan(enic, vid);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -230,9 +230,9 @@ int enic_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, u16 vid)
 	struct enic *enic = netdev_priv(netdev);
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = enic_del_vlan(enic, vid);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -241,9 +241,9 @@ int enic_dev_enable2(struct enic *enic, int active)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_enable2(enic->vdev, active);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -252,9 +252,9 @@ int enic_dev_enable2_done(struct enic *enic, int *status)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = vnic_dev_enable2_done(enic->vdev, status);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
diff --git a/drivers/net/ethernet/cisco/enic/enic_dev.h b/drivers/net/ethernet/cisco/enic/enic_dev.h
index 36ea1ab..10bb970 100644
--- a/drivers/net/ethernet/cisco/enic/enic_dev.h
+++ b/drivers/net/ethernet/cisco/enic/enic_dev.h
@@ -28,7 +28,7 @@
  */
 #define ENIC_DEVCMD_PROXY_BY_INDEX(vf, err, enic, vnicdevcmdfn, ...) \
 	do { \
-		spin_lock(&enic->devcmd_lock); \
+		spin_lock_bh(&enic->devcmd_lock); \
 		if (enic_is_valid_vf(enic, vf)) { \
 			vnic_dev_cmd_proxy_by_index_start(enic->vdev, vf); \
 			err = vnicdevcmdfn(enic->vdev, ##__VA_ARGS__); \
@@ -36,7 +36,7 @@
 		} else { \
 			err = vnicdevcmdfn(enic->vdev, ##__VA_ARGS__); \
 		} \
-		spin_unlock(&enic->devcmd_lock); \
+		spin_unlock_bh(&enic->devcmd_lock); \
 	} while (0)
 
 int enic_dev_fw_info(struct enic *enic, struct vnic_devcmd_fw_info **fw_info);
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 67a12a7..33decff 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1458,7 +1458,7 @@ static int enic_dev_notify_set(struct enic *enic)
 {
 	int err;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	switch (vnic_dev_get_intr_mode(enic->vdev)) {
 	case VNIC_DEV_INTR_MODE_INTX:
 		err = vnic_dev_notify_set(enic->vdev,
@@ -1472,7 +1472,7 @@ static int enic_dev_notify_set(struct enic *enic)
 		err = vnic_dev_notify_set(enic->vdev, -1 /* no intr */);
 		break;
 	}
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
@@ -1801,11 +1801,11 @@ static int enic_set_rsskey(struct enic *enic)
 
 	memcpy(rss_key_buf_va, &rss_key, sizeof(union vnic_rss_key));
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = enic_set_rss_key(enic,
 		rss_key_buf_pa,
 		sizeof(union vnic_rss_key));
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	pci_free_consistent(enic->pdev, sizeof(union vnic_rss_key),
 		rss_key_buf_va, rss_key_buf_pa);
@@ -1828,11 +1828,11 @@ static int enic_set_rsscpu(struct enic *enic, u8 rss_hash_bits)
 	for (i = 0; i < (1 << rss_hash_bits); i++)
 		(*rss_cpu_buf_va).cpu[i/4].b[i%4] = i % enic->rq_count;
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = enic_set_rss_cpu(enic,
 		rss_cpu_buf_pa,
 		sizeof(union vnic_rss_cpu));
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	pci_free_consistent(enic->pdev, sizeof(union vnic_rss_cpu),
 		rss_cpu_buf_va, rss_cpu_buf_pa);
@@ -1850,13 +1850,13 @@ static int enic_set_niccfg(struct enic *enic, u8 rss_default_cpu,
 	/* Enable VLAN tag stripping.
 	*/
 
-	spin_lock(&enic->devcmd_lock);
+	spin_lock_bh(&enic->devcmd_lock);
 	err = enic_set_nic_cfg(enic,
 		rss_default_cpu, rss_hash_type,
 		rss_hash_bits, rss_base_cpu,
 		rss_enable, tso_ipid_split_en,
 		ig_vlan_strip_en);
-	spin_unlock(&enic->devcmd_lock);
+	spin_unlock_bh(&enic->devcmd_lock);
 
 	return err;
 }
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 7/8] enic: add low latency socket busy_poll support
  2014-06-09 18:32 [PATCH net-next 0/8] enic updates Govindarajulu Varadarajan
                   ` (5 preceding siblings ...)
  2014-06-09 18:32 ` [PATCH net-next 6/8] enic: fix lockdep around devcmd_lock Govindarajulu Varadarajan
@ 2014-06-09 18:32 ` Govindarajulu Varadarajan
  6 siblings, 0 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-09 18:32 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, gvaradar, benve, _govind

This patch adds support for low latency busy_poll.

* Introduce drivers ndo_busy_poll function enic_busy_poll, which is called by
socket waiting for data.

* Introduce locking between napi_poll nad busy_poll

* enic_busy_poll cleans up all the rx pkts possible. While in busy_poll, rq
holds the state ENIC_POLL_STATE_POLL. While in napi_poll, rq holds the state
ENIC_POLL_STATE_NAPI.

* in napi_poll we return if we are in busy_poll. Incase of INTx & msix, we just
service wq and return if busy_poll is going on.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic_main.c |  85 ++++++++++++++++---
 drivers/net/ethernet/cisco/enic/vnic_rq.h   | 122 ++++++++++++++++++++++++++++
 2 files changed, 195 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 33decff..558ee45 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -42,6 +42,9 @@
 #ifdef CONFIG_RFS_ACCEL
 #include <linux/cpu_rmap.h>
 #endif
+#ifdef CONFIG_NET_RX_BUSY_POLL
+#include <net/busy_poll.h>
+#endif
 
 #include "cq_enet_desc.h"
 #include "vnic_dev.h"
@@ -1053,10 +1056,12 @@ static void enic_rq_indicate_buf(struct vnic_rq *rq,
 		if (vlan_stripped)
 			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tci);
 
-		if (netdev->features & NETIF_F_GRO)
-			napi_gro_receive(&enic->napi[q_number], skb);
-		else
+		skb_mark_napi_id(skb, &enic->napi[rq->index]);
+		if (enic_poll_busy_polling(rq) ||
+		    !(netdev->features & NETIF_F_GRO))
 			netif_receive_skb(skb);
+		else
+			napi_gro_receive(&enic->napi[q_number], skb);
 		if (enic->rx_coalesce_setting.use_adaptive_rx_coalesce)
 			enic_intr_update_pkt_size(&cq->pkt_size_counter,
 						  bytes_written);
@@ -1093,16 +1098,22 @@ static int enic_poll(struct napi_struct *napi, int budget)
 	unsigned int  work_done, rq_work_done = 0, wq_work_done;
 	int err;
 
-	/* Service RQ (first) and WQ
-	 */
+	wq_work_done = vnic_cq_service(&enic->cq[cq_wq], wq_work_to_do,
+				       enic_wq_service, NULL);
+
+	if (!enic_poll_lock_napi(&enic->rq[cq_rq])) {
+		if (wq_work_done > 0)
+			vnic_intr_return_credits(&enic->intr[intr],
+						 wq_work_done,
+						 0 /* dont unmask intr */,
+						 0 /* dont reset intr timer */);
+		return rq_work_done;
+	}
 
 	if (budget > 0)
 		rq_work_done = vnic_cq_service(&enic->cq[cq_rq],
 			rq_work_to_do, enic_rq_service, NULL);
 
-	wq_work_done = vnic_cq_service(&enic->cq[cq_wq],
-		wq_work_to_do, enic_wq_service, NULL);
-
 	/* Accumulate intr event credits for this polling
 	 * cycle.  An intr event is the completion of a
 	 * a WQ or RQ packet.
@@ -1134,6 +1145,7 @@ static int enic_poll(struct napi_struct *napi, int budget)
 		napi_complete(napi);
 		vnic_intr_unmask(&enic->intr[intr]);
 	}
+	enic_poll_unlock_napi(&enic->rq[cq_rq]);
 
 	return rq_work_done;
 }
@@ -1223,6 +1235,34 @@ static inline void enic_set_rx_cpu_rmap(struct enic *enic)
 }
 #endif
 
+#ifdef CONFIG_NET_RX_BUSY_POLL
+int enic_busy_poll(struct napi_struct *napi)
+{
+	struct net_device *netdev = napi->dev;
+	struct enic *enic = netdev_priv(netdev);
+	unsigned int rq = (napi - &enic->napi[0]);
+	unsigned int cq = enic_cq_rq(enic, rq);
+	unsigned int intr = enic_msix_rq_intr(enic, rq);
+	unsigned int work_to_do = -1; /* clean all pkts possible */
+	unsigned int work_done;
+
+	if (!enic_poll_lock_poll(&enic->rq[rq]))
+		return LL_FLUSH_BUSY;
+	work_done = vnic_cq_service(&enic->cq[cq], work_to_do,
+				    enic_rq_service, NULL);
+
+	if (work_done > 0)
+		vnic_intr_return_credits(&enic->intr[intr],
+					 work_done, 0, 0);
+	vnic_rq_fill(&enic->rq[rq], enic_rq_alloc_buf);
+	if (enic->rx_coalesce_setting.use_adaptive_rx_coalesce)
+		enic_calc_int_moderation(enic, &enic->rq[rq]);
+	enic_poll_unlock_poll(&enic->rq[rq]);
+
+	return work_done;
+}
+#endif /* CONFIG_NET_RX_BUSY_POLL */
+
 static int enic_poll_msix(struct napi_struct *napi, int budget)
 {
 	struct net_device *netdev = napi->dev;
@@ -1234,6 +1274,8 @@ static int enic_poll_msix(struct napi_struct *napi, int budget)
 	unsigned int work_done = 0;
 	int err;
 
+	if (!enic_poll_lock_napi(&enic->rq[rq]))
+		return work_done;
 	/* Service RQ
 	 */
 
@@ -1279,6 +1321,7 @@ static int enic_poll_msix(struct napi_struct *napi, int budget)
 			enic_set_int_moderation(enic, &enic->rq[rq]);
 		vnic_intr_unmask(&enic->intr[intr]);
 	}
+	enic_poll_unlock_napi(&enic->rq[rq]);
 
 	return work_done;
 }
@@ -1531,8 +1574,10 @@ static int enic_open(struct net_device *netdev)
 
 	netif_tx_wake_all_queues(netdev);
 
-	for (i = 0; i < enic->rq_count; i++)
+	for (i = 0; i < enic->rq_count; i++) {
+		enic_busy_poll_init_lock(&enic->rq[i]);
 		napi_enable(&enic->napi[i]);
+	}
 
 	enic_dev_enable(enic);
 
@@ -1575,8 +1620,13 @@ static int enic_stop(struct net_device *netdev)
 
 	enic_dev_disable(enic);
 
-	for (i = 0; i < enic->rq_count; i++)
+	local_bh_disable();
+	for (i = 0; i < enic->rq_count; i++) {
 		napi_disable(&enic->napi[i]);
+		while (!enic_poll_lock_napi(&enic->rq[i]))
+			mdelay(1);
+	}
+	local_bh_enable();
 
 	netif_carrier_off(netdev);
 	netif_tx_disable(netdev);
@@ -2067,6 +2117,9 @@ static const struct net_device_ops enic_netdev_dynamic_ops = {
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= enic_rx_flow_steer,
 #endif
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	.ndo_busy_poll		= enic_busy_poll,
+#endif
 };
 
 static const struct net_device_ops enic_netdev_ops = {
@@ -2090,14 +2143,19 @@ static const struct net_device_ops enic_netdev_ops = {
 #ifdef CONFIG_RFS_ACCEL
 	.ndo_rx_flow_steer	= enic_rx_flow_steer,
 #endif
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	.ndo_busy_poll		= enic_busy_poll,
+#endif
 };
 
 static void enic_dev_deinit(struct enic *enic)
 {
 	unsigned int i;
 
-	for (i = 0; i < enic->rq_count; i++)
+	for (i = 0; i < enic->rq_count; i++) {
+		napi_hash_del(&enic->napi[i]);
 		netif_napi_del(&enic->napi[i]);
+	}
 
 	enic_free_vnic_resources(enic);
 	enic_clear_intr_mode(enic);
@@ -2163,11 +2221,14 @@ static int enic_dev_init(struct enic *enic)
 	switch (vnic_dev_get_intr_mode(enic->vdev)) {
 	default:
 		netif_napi_add(netdev, &enic->napi[0], enic_poll, 64);
+		napi_hash_add(&enic->napi[0]);
 		break;
 	case VNIC_DEV_INTR_MODE_MSIX:
-		for (i = 0; i < enic->rq_count; i++)
+		for (i = 0; i < enic->rq_count; i++) {
 			netif_napi_add(netdev, &enic->napi[i],
 				enic_poll_msix, 64);
+			napi_hash_add(&enic->napi[i]);
+		}
 		break;
 	}
 
diff --git a/drivers/net/ethernet/cisco/enic/vnic_rq.h b/drivers/net/ethernet/cisco/enic/vnic_rq.h
index ee7bc95..8111d52 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_rq.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_rq.h
@@ -85,6 +85,21 @@ struct vnic_rq {
 	struct vnic_rq_buf *to_clean;
 	void *os_buf_head;
 	unsigned int pkts_outstanding;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+#define ENIC_POLL_STATE_IDLE		0
+#define ENIC_POLL_STATE_NAPI		(1 << 0) /* NAPI owns this poll */
+#define ENIC_POLL_STATE_POLL		(1 << 1) /* poll owns this poll */
+#define ENIC_POLL_STATE_NAPI_YIELD	(1 << 2) /* NAPI yielded this poll */
+#define ENIC_POLL_STATE_POLL_YIELD	(1 << 3) /* poll yielded this poll */
+#define ENIC_POLL_YIELD			(ENIC_POLL_STATE_NAPI_YIELD |	\
+					 ENIC_POLL_STATE_POLL_YIELD)
+#define ENIC_POLL_LOCKED		(ENIC_POLL_STATE_NAPI |		\
+					 ENIC_POLL_STATE_POLL)
+#define ENIC_POLL_USER_PEND		(ENIC_POLL_STATE_POLL |		\
+					 ENIC_POLL_STATE_POLL_YIELD)
+	unsigned int bpoll_state;
+	spinlock_t bpoll_lock;
+#endif /* CONFIG_NET_RX_BUSY_POLL */
 };
 
 static inline unsigned int vnic_rq_desc_avail(struct vnic_rq *rq)
@@ -197,6 +212,113 @@ static inline int vnic_rq_fill(struct vnic_rq *rq,
 	return 0;
 }
 
+#ifdef CONFIG_NET_RX_BUSY_POLL
+static inline void enic_busy_poll_init_lock(struct vnic_rq *rq)
+{
+	spin_lock_init(&rq->bpoll_lock);
+	rq->bpoll_state = ENIC_POLL_STATE_IDLE;
+}
+
+static inline bool enic_poll_lock_napi(struct vnic_rq *rq)
+{
+	bool rc = true;
+
+	spin_lock(&rq->bpoll_lock);
+	if (rq->bpoll_state & ENIC_POLL_LOCKED) {
+		WARN_ON(rq->bpoll_state & ENIC_POLL_STATE_NAPI);
+		rq->bpoll_state |= ENIC_POLL_STATE_NAPI_YIELD;
+		rc = false;
+	} else {
+		rq->bpoll_state = ENIC_POLL_STATE_NAPI;
+	}
+	spin_unlock(&rq->bpoll_lock);
+
+	return rc;
+}
+
+static inline bool enic_poll_unlock_napi(struct vnic_rq *rq)
+{
+	bool rc = false;
+
+	spin_lock(&rq->bpoll_lock);
+	WARN_ON(rq->bpoll_state &
+		(ENIC_POLL_STATE_POLL | ENIC_POLL_STATE_NAPI_YIELD));
+	if (rq->bpoll_state & ENIC_POLL_STATE_POLL_YIELD)
+		rc = true;
+	rq->bpoll_state = ENIC_POLL_STATE_IDLE;
+	spin_unlock(&rq->bpoll_lock);
+
+	return rc;
+}
+
+static inline bool enic_poll_lock_poll(struct vnic_rq *rq)
+{
+	bool rc = true;
+
+	spin_lock_bh(&rq->bpoll_lock);
+	if (rq->bpoll_state & ENIC_POLL_LOCKED) {
+		rq->bpoll_state |= ENIC_POLL_STATE_POLL_YIELD;
+		rc = false;
+	} else {
+		rq->bpoll_state |= ENIC_POLL_STATE_POLL;
+	}
+	spin_unlock_bh(&rq->bpoll_lock);
+
+	return rc;
+}
+
+static inline bool enic_poll_unlock_poll(struct vnic_rq *rq)
+{
+	bool rc = false;
+
+	spin_lock_bh(&rq->bpoll_lock);
+	WARN_ON(rq->bpoll_state & ENIC_POLL_STATE_NAPI);
+	if (rq->bpoll_state & ENIC_POLL_STATE_POLL_YIELD)
+		rc = true;
+	rq->bpoll_state = ENIC_POLL_STATE_IDLE;
+	spin_unlock_bh(&rq->bpoll_lock);
+
+	return rc;
+}
+
+static inline bool enic_poll_busy_polling(struct vnic_rq *rq)
+{
+	WARN_ON(!(rq->bpoll_state & ENIC_POLL_LOCKED));
+	return rq->bpoll_state & ENIC_POLL_USER_PEND;
+}
+
+#else
+
+static inline void enic_busy_poll_init_lock(struct vnic_rq *rq)
+{
+}
+
+static inline bool enic_poll_lock_napi(struct vnic_rq *rq)
+{
+	return true;
+}
+
+static inline bool enic_poll_unlock_napi(struct vnic_rq *rq)
+{
+	return false;
+}
+
+static inline bool enic_poll_lock_poll(struct vnic_rq *rq)
+{
+	return false;
+}
+
+static inline bool enic_poll_unlock_poll(struct vnic_rq *rq)
+{
+	return false;
+}
+
+static inline bool enic_poll_ll_polling(struct vnic_rq *rq)
+{
+	return false;
+}
+#endif /* CONFIG_NET_RX_BUSY_POLL */
+
 void vnic_rq_free(struct vnic_rq *rq);
 int vnic_rq_alloc(struct vnic_dev *vdev, struct vnic_rq *rq, unsigned int index,
 	unsigned int desc_count, unsigned int desc_size);
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 4/8] enic: alloc/free rx_cpu_rmap
  2014-06-09 18:32 ` [PATCH net-next 4/8] enic: alloc/free rx_cpu_rmap Govindarajulu Varadarajan
@ 2014-06-09 18:44   ` Sergei Shtylyov
  2014-06-10 13:52     ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 36+ messages in thread
From: Sergei Shtylyov @ 2014-06-09 18:44 UTC (permalink / raw)
  To: Govindarajulu Varadarajan, davem, netdev; +Cc: ssujith, gvaradar, benve

Hello.

On 06/09/2014 10:32 PM, Govindarajulu Varadarajan wrote:

> rx_cpu_rmap provides the reverse irq cpu affinity. This patch allocates and
> sets drivers netdev->rx_cpu_rmap accordingly.

> rx_cpu_rmap is set in enic_request_intr() which is called by enic_open and
> rx_cpu_rmap is freed in enic_free_intr() which is called by enic_stop.

> This is used by Accelerated RFS.

> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
> ---
>   drivers/net/ethernet/cisco/enic/enic_main.c | 36 +++++++++++++++++++++++++++++
>   1 file changed, 36 insertions(+)

> diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
> index f32f828..f4508d9 100644
> --- a/drivers/net/ethernet/cisco/enic/enic_main.c
> +++ b/drivers/net/ethernet/cisco/enic/enic_main.c
[...]
> @@ -1192,6 +1195,33 @@ static void enic_calc_int_moderation(struct enic *enic, struct vnic_rq *rq)
>   	pkt_size_counter->small_pkt_bytes_cnt = 0;
>   }
>
> +#ifdef CONFIG_RFS_ACCEL
> +static void enic_free_rx_cpu_rmap(struct enic *enic)
> +{
> +	free_irq_cpu_rmap(enic->netdev->rx_cpu_rmap);
> +	enic->netdev->rx_cpu_rmap = NULL;
> +}
> +
> +static inline void enic_set_rx_cpu_rmap(struct enic *enic)

    No need to use *inline* in a .c file, the compiler should figure it out.

> +{
> +	int i, res;
> +
> +	if (vnic_dev_get_intr_mode(enic->vdev) == VNIC_DEV_INTR_MODE_MSIX) {
> +		enic->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(enic->rq_count);
> +		if (unlikely(!enic->netdev->rx_cpu_rmap))
> +			return;
> +		for (i = 0; i < enic->rq_count; i++) {
> +			res = irq_cpu_rmap_add(enic->netdev->rx_cpu_rmap,
> +					       enic->msix_entry[i].vector);
> +			if (unlikely(res)) {
> +				enic_free_rx_cpu_rmap(enic);
> +				return;
> +			}
> +		}
> +	}
> +}

    It's better to do the following here:

#else
static void enic_free_rx_cpu_rmap(struct enic *enic)
{
}
static void enic_set_rx_cpu_rmap(struct enic *enic)
{
}

> +#endif
> +
>   static int enic_poll_msix(struct napi_struct *napi, int budget)
>   {
>   	struct net_device *netdev = napi->dev;
> @@ -1267,6 +1297,9 @@ static void enic_free_intr(struct enic *enic)
>   	struct net_device *netdev = enic->netdev;
>   	unsigned int i;
>
> +#ifdef CONFIG_RFS_ACCEL
> +	enic_free_rx_cpu_rmap(enic);
> +#endif

    ... so that you can avoid #ifdef's at the call sites.

WBR, Sergei

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-09 18:32 ` [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect() Govindarajulu Varadarajan
@ 2014-06-10 13:11   ` Daniel Borkmann
  2014-06-10 14:08     ` Govindarajulu Varadarajan
  2014-06-19 18:06   ` Chema Gonzalez
  2014-06-24  5:29   ` Yinghai Lu
  2 siblings, 1 reply; 36+ messages in thread
From: Daniel Borkmann @ 2014-06-10 13:11 UTC (permalink / raw)
  To: Govindarajulu Varadarajan
  Cc: davem, netdev, ssujith, gvaradar, benve, eric.dumazet



On 06/09/2014 08:32 PM, Govindarajulu Varadarajan wrote:
> skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
> give any information about IPv4 or IPv6.
>
> This patch adds new member, n_proto, to struct flow_keys. Which records the
> IP layer type. i.e IPv4 or IPv6.
>
> This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>
> Adding new member to flow_keys increases the struct size by around 4 bytes.
> This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
> qdisc_cb_private_validate()
>
> So increase data size by 4
>
> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
> ---
>   include/net/flow_keys.h   | 14 ++++++++++++++
>   include/net/sch_generic.h |  2 +-
>   net/core/flow_dissector.c |  1 +
>   3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
> index 7e64bd8..fbefdca 100644
> --- a/include/net/flow_keys.h
> +++ b/include/net/flow_keys.h
> @@ -1,6 +1,19 @@
>   #ifndef _NET_FLOW_KEYS_H
>   #define _NET_FLOW_KEYS_H
>
> +/* struct flow_keys:
> + *	@src: source ip address in case of IPv4
> + *	      For IPv6 it contains 32bit hash of src address
> + *	@dst: destination ip address in case of IPv4
> + *	      For IPv6 it contains 32bit hash of dst address
> + *	@ports: port numbers of Transport header
> + *		port16[0]: src port number
> + *		port16[1]: dst port number
> + *	@thoff: Transport header offset
> + *	@n_proto: Network header protocol (eg. IPv4/IPv6)
> + *	@ip_proto: Transport header protocol (eg. TCP/UDP)
> + * All the members, except thoff, are in network byte order.
> + */
>   struct flow_keys {
>   	/* (src,dst) must be grouped, in the same way than in IP header */
>   	__be32 src;
> @@ -10,6 +23,7 @@ struct flow_keys {
>   		__be16 port16[2];
>   	};
>   	u16 thoff;
> +	u16 n_proto;
>   	u8 ip_proto;
>   };
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 624f985..a3cfb8e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>   	unsigned int		pkt_len;
>   	u16			slave_dev_queue_mapping;
>   	u16			_pad;
> -	unsigned char		data[20];
> +	unsigned char		data[24];

I'm wondering if this is actually needed. We add an extra
u16 n_proto into the flow_keys *just* to determine IPv4/v6
while if it finds anything else than this the dissector
returns false anyway w/o filling out the flow keys structure.
Plus, in case of IPv6 you'll have a hashed/folded src/dst
addr anyway.

>   };
>
>   static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 107ed12..c2b53c1 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -175,6 +175,7 @@ ipv6:
>   		break;
>   	}
>
> +	flow->n_proto = proto;
>   	flow->ip_proto = ip_proto;
>   	flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
>   	flow->thoff = (u16) nhoff;
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 4/8] enic: alloc/free rx_cpu_rmap
  2014-06-09 18:44   ` Sergei Shtylyov
@ 2014-06-10 13:52     ` Govindarajulu Varadarajan
  2014-06-10 15:00       ` Sergei Shtylyov
  0 siblings, 1 reply; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-10 13:52 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Govindarajulu Varadarajan, davem, netdev, ssujith, gvaradar, benve



On Mon, 9 Jun 2014, Sergei Shtylyov wrote:

> @@ -1192,6 +1195,33 @@ static void enic_calc_int_moderation(struct enic 
>> *enic, struct vnic_rq *rq)
>>   	pkt_size_counter->small_pkt_bytes_cnt = 0;
>>   }
>> 
>> +#ifdef CONFIG_RFS_ACCEL
>> +static void enic_free_rx_cpu_rmap(struct enic *enic)
>> +{
>> +	free_irq_cpu_rmap(enic->netdev->rx_cpu_rmap);
>> +	enic->netdev->rx_cpu_rmap = NULL;
>> +}
>> +
>> +static inline void enic_set_rx_cpu_rmap(struct enic *enic)
>
>   No need to use *inline* in a .c file, the compiler should figure it out.
>

Yes, I agree.

>> +{
>> +	int i, res;
>> +
>> +	if (vnic_dev_get_intr_mode(enic->vdev) == VNIC_DEV_INTR_MODE_MSIX) {
>> +		enic->netdev->rx_cpu_rmap = 
>> alloc_irq_cpu_rmap(enic->rq_count);
>> +		if (unlikely(!enic->netdev->rx_cpu_rmap))
>> +			return;
>> +		for (i = 0; i < enic->rq_count; i++) {
>> +			res = irq_cpu_rmap_add(enic->netdev->rx_cpu_rmap,
>> +					       enic->msix_entry[i].vector);
>> +			if (unlikely(res)) {
>> +				enic_free_rx_cpu_rmap(enic);
>> +				return;
>> +			}
>> +		}
>> +	}
>> +}
>
>   It's better to do the following here:
>
> #else
> static void enic_free_rx_cpu_rmap(struct enic *enic)
> {
> }
> static void enic_set_rx_cpu_rmap(struct enic *enic)
> {
> }
>

How about

static void enic_free_rx_cpu_rmap(struct enic *enic)
{
#ifdef CONFIG_RFS_ACCEL
...
...
#endif
}

I prefer this over yours because, if I use yours tools like cscope finds two
definitions of function enic_free_rx_cpu_rmap. Which makes code walk through
little bit difficult.

Thanks
Govind

>> +#endif
>> +
>>   static int enic_poll_msix(struct napi_struct *napi, int budget)
>>   {
>>   	struct net_device *netdev = napi->dev;
>> @@ -1267,6 +1297,9 @@ static void enic_free_intr(struct enic *enic)
>>   	struct net_device *netdev = enic->netdev;
>>   	unsigned int i;
>> 
>> +#ifdef CONFIG_RFS_ACCEL
>> +	enic_free_rx_cpu_rmap(enic);
>> +#endif
>
>   ... so that you can avoid #ifdef's at the call sites.
>
> WBR, Sergei
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-10 13:11   ` Daniel Borkmann
@ 2014-06-10 14:08     ` Govindarajulu Varadarajan
  2014-06-10 14:26       ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-10 14:08 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Govindarajulu Varadarajan, davem, netdev, ssujith, gvaradar,
	benve, eric.dumazet



On Tue, 10 Jun 2014, Daniel Borkmann wrote:
>> diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
>> index 7e64bd8..fbefdca 100644
>> --- a/include/net/flow_keys.h
>> +++ b/include/net/flow_keys.h
>> @@ -1,6 +1,19 @@
>>   #ifndef _NET_FLOW_KEYS_H
>>   #define _NET_FLOW_KEYS_H
>> 
>> +/* struct flow_keys:
>> + *	@src: source ip address in case of IPv4
>> + *	      For IPv6 it contains 32bit hash of src address
>> + *	@dst: destination ip address in case of IPv4
>> + *	      For IPv6 it contains 32bit hash of dst address
>> + *	@ports: port numbers of Transport header
>> + *		port16[0]: src port number
>> + *		port16[1]: dst port number
>> + *	@thoff: Transport header offset
>> + *	@n_proto: Network header protocol (eg. IPv4/IPv6)
>> + *	@ip_proto: Transport header protocol (eg. TCP/UDP)
>> + * All the members, except thoff, are in network byte order.
>> + */
>>   struct flow_keys {
>>   	/* (src,dst) must be grouped, in the same way than in IP header */
>>   	__be32 src;
>> @@ -10,6 +23,7 @@ struct flow_keys {
>>   		__be16 port16[2];
>>   	};
>>   	u16 thoff;
>> +	u16 n_proto;
>>   	u8 ip_proto;
>>   };
>> 
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 624f985..a3cfb8e 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>   	unsigned int		pkt_len;
>>   	u16			slave_dev_queue_mapping;
>>   	u16			_pad;
>> -	unsigned char		data[20];
>> +	unsigned char		data[24];
>
> I'm wondering if this is actually needed. We add an extra
> u16 n_proto into the flow_keys *just* to determine IPv4/v6
> while if it finds anything else than this the dissector
> returns false anyway w/o filling out the flow keys structure.
> Plus, in case of IPv6 you'll have a hashed/folded src/dst
> addr anyway.
>

determining IPv4/IPv6 is important because this can be used in dissecting flow
in Accelerated RFS. Adaptor does not support IPv6 filters. Since Accelerated
RFS is supported for IPV6, using skb_flow_dissect will return true with non-zero
values in src/dst for IPv6.

If I am going to write separate function for getting IP address and port
numbers, its definition is going to be somewhat same as skb_flow_dissect.
Why not improve whats already written and reuse it?

Is there any significant downside of adding u16 n_proto and increasing
size of qdisc_skb_cb by 4 bytes?

Thanks
Govind

>>   };
>>
>>   static inline void qdisc_cb_private_validate(const struct sk_buff *skb, 
>> int sz)
>> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> index 107ed12..c2b53c1 100644
>> --- a/net/core/flow_dissector.c
>> +++ b/net/core/flow_dissector.c
>> @@ -175,6 +175,7 @@ ipv6:
>>   		break;
>>   	}
>> 
>> +	flow->n_proto = proto;
>>   	flow->ip_proto = ip_proto;
>>   	flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
>>   	flow->thoff = (u16) nhoff;
>> 
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-10 14:08     ` Govindarajulu Varadarajan
@ 2014-06-10 14:26       ` Eric Dumazet
  2014-06-11 22:06         ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-06-10 14:26 UTC (permalink / raw)
  To: Govindarajulu Varadarajan
  Cc: Daniel Borkmann, davem, netdev, ssujith, gvaradar, benve

On Tue, 2014-06-10 at 19:38 +0530, Govindarajulu Varadarajan wrote:
> 

> determining IPv4/IPv6 is important because this can be used in dissecting flow
> in Accelerated RFS. Adaptor does not support IPv6 filters. Since Accelerated
> RFS is supported for IPV6, using skb_flow_dissect will return true with non-zero
> values in src/dst for IPv6.
> 
> If I am going to write separate function for getting IP address and port
> numbers, its definition is going to be somewhat same as skb_flow_dissect.
> Why not improve whats already written and reuse it?
> 
> Is there any significant downside of adding u16 n_proto and increasing
> size of qdisc_skb_cb by 4 bytes?

You can avoid this increase (might be bad for IB, hard to tell), by
changing sch_choke.c to only store a part of the struct flow_keys.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 4/8] enic: alloc/free rx_cpu_rmap
  2014-06-10 13:52     ` Govindarajulu Varadarajan
@ 2014-06-10 15:00       ` Sergei Shtylyov
  0 siblings, 0 replies; 36+ messages in thread
From: Sergei Shtylyov @ 2014-06-10 15:00 UTC (permalink / raw)
  To: Govindarajulu Varadarajan; +Cc: davem, netdev, ssujith, gvaradar, benve

Hello.

On 06/10/2014 05:52 PM, Govindarajulu Varadarajan wrote:

> >> @@ -1192,6 +1195,33 @@ static void enic_calc_int_moderation(struct enic
>>> *enic, struct vnic_rq *rq)
>>>       pkt_size_counter->small_pkt_bytes_cnt = 0;
>>>   }
>>>
>>> +#ifdef CONFIG_RFS_ACCEL
>>> +static void enic_free_rx_cpu_rmap(struct enic *enic)
>>> +{
>>> +    free_irq_cpu_rmap(enic->netdev->rx_cpu_rmap);
>>> +    enic->netdev->rx_cpu_rmap = NULL;
>>> +}
>>> +
>>> +static inline void enic_set_rx_cpu_rmap(struct enic *enic)

>>   No need to use *inline* in a .c file, the compiler should figure it out.

> Yes, I agree.

>>> +{
>>> +    int i, res;
>>> +
>>> +    if (vnic_dev_get_intr_mode(enic->vdev) == VNIC_DEV_INTR_MODE_MSIX) {
>>> +        enic->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(enic->rq_count);
>>> +        if (unlikely(!enic->netdev->rx_cpu_rmap))
>>> +            return;
>>> +        for (i = 0; i < enic->rq_count; i++) {
>>> +            res = irq_cpu_rmap_add(enic->netdev->rx_cpu_rmap,
>>> +                           enic->msix_entry[i].vector);
>>> +            if (unlikely(res)) {
>>> +                enic_free_rx_cpu_rmap(enic);
>>> +                return;
>>> +            }
>>> +        }
>>> +    }
>>> +}

>>   It's better to do the following here:

>> #else
>> static void enic_free_rx_cpu_rmap(struct enic *enic)
>> {
>> }
>> static void enic_set_rx_cpu_rmap(struct enic *enic)
>> {
>> }

> How about

> static void enic_free_rx_cpu_rmap(struct enic *enic)
> {
> #ifdef CONFIG_RFS_ACCEL
> ...
> ...
> #endif
> }

> I prefer this over yours because, if I use yours tools like cscope finds two
> definitions of function enic_free_rx_cpu_rmap. Which makes code walk through
> little bit difficult.

    #ifdef's in the function bodies are generally frowned upon. See 
Documentation/SubmittingPatches section 2 item (2).

> Thanks
> Govind

>>> +#endif
>>> +
>>>   static int enic_poll_msix(struct napi_struct *napi, int budget)
>>>   {
>>>       struct net_device *netdev = napi->dev;
>>> @@ -1267,6 +1297,9 @@ static void enic_free_intr(struct enic *enic)
>>>       struct net_device *netdev = enic->netdev;
>>>       unsigned int i;
>>>
>>> +#ifdef CONFIG_RFS_ACCEL
>>> +    enic_free_rx_cpu_rmap(enic);
>>> +#endif

>>   ... so that you can avoid #ifdef's at the call sites.

WBR, Sergei

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-10 14:26       ` Eric Dumazet
@ 2014-06-11 22:06         ` David Miller
  2014-06-11 22:08           ` David Miller
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2014-06-11 22:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: _govind, dborkman, netdev, ssujith, gvaradar, benve

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 10 Jun 2014 07:26:30 -0700

> On Tue, 2014-06-10 at 19:38 +0530, Govindarajulu Varadarajan wrote:
>> 
> 
>> determining IPv4/IPv6 is important because this can be used in dissecting flow
>> in Accelerated RFS. Adaptor does not support IPv6 filters. Since Accelerated
>> RFS is supported for IPV6, using skb_flow_dissect will return true with non-zero
>> values in src/dst for IPv6.
>> 
>> If I am going to write separate function for getting IP address and port
>> numbers, its definition is going to be somewhat same as skb_flow_dissect.
>> Why not improve whats already written and reuse it?
>> 
>> Is there any significant downside of adding u16 n_proto and increasing
>> size of qdisc_skb_cb by 4 bytes?
> 
> You can avoid this increase (might be bad for IB, hard to tell), by
> changing sch_choke.c to only store a part of the struct flow_keys.

I think this is fine, IPOIB's control block will need still just 44
bytes after these changes, so there will still be 4 bytes to spare.

I'm going to apply this series.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-11 22:06         ` David Miller
@ 2014-06-11 22:08           ` David Miller
  2014-06-12  5:38             ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2014-06-11 22:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: _govind, dborkman, netdev, ssujith, gvaradar, benve

From: David Miller <davem@davemloft.net>
Date: Wed, 11 Jun 2014 15:06:36 -0700 (PDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 10 Jun 2014 07:26:30 -0700
> 
>> On Tue, 2014-06-10 at 19:38 +0530, Govindarajulu Varadarajan wrote:
>>> 
>> 
>>> determining IPv4/IPv6 is important because this can be used in dissecting flow
>>> in Accelerated RFS. Adaptor does not support IPv6 filters. Since Accelerated
>>> RFS is supported for IPV6, using skb_flow_dissect will return true with non-zero
>>> values in src/dst for IPv6.
>>> 
>>> If I am going to write separate function for getting IP address and port
>>> numbers, its definition is going to be somewhat same as skb_flow_dissect.
>>> Why not improve whats already written and reuse it?
>>> 
>>> Is there any significant downside of adding u16 n_proto and increasing
>>> size of qdisc_skb_cb by 4 bytes?
>> 
>> You can avoid this increase (might be bad for IB, hard to tell), by
>> changing sch_choke.c to only store a part of the struct flow_keys.
> 
> I think this is fine, IPOIB's control block will need still just 44
> bytes after these changes, so there will still be 4 bytes to spare.
> 
> I'm going to apply this series.

Actually, I change my mind, Govindarajulu can you address Sergei's feedback
in your other changes and repost this series?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-11 22:08           ` David Miller
@ 2014-06-12  5:38             ` Govindarajulu Varadarajan
  0 siblings, 0 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-12  5:38 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, _govind, dborkman, netdev, ssujith, gvaradar, benve



On Wed, 11 Jun 2014, David Miller wrote:

> Actually, I change my mind, Govindarajulu can you address Sergei's feedback
> in your other changes and repost this series?
>

Sure, I will send v2 soon.

Thanks
Govind

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-09 18:32 ` [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect() Govindarajulu Varadarajan
  2014-06-10 13:11   ` Daniel Borkmann
@ 2014-06-19 18:06   ` Chema Gonzalez
  2014-06-19 18:52     ` Govindarajulu Varadarajan
  2014-06-24  5:29   ` Yinghai Lu
  2 siblings, 1 reply; 36+ messages in thread
From: Chema Gonzalez @ 2014-06-19 18:06 UTC (permalink / raw)
  To: Govindarajulu Varadarajan; +Cc: davem, netdev, ssujith, gvaradar, benve

On Mon, Jun 9, 2014 at 11:32 AM, Govindarajulu Varadarajan
<_govind@gmx.com> wrote:
> skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
> give any information about IPv4 or IPv6.
>
> This patch adds new member, n_proto, to struct flow_keys. Which records the
> IP layer type. i.e IPv4 or IPv6.
>
> This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>
> Adding new member to flow_keys increases the struct size by around 4 bytes.
> This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
> qdisc_cb_private_validate()
>
> So increase data size by 4
>
> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
> ---
>  include/net/flow_keys.h   | 14 ++++++++++++++
>  include/net/sch_generic.h |  2 +-
>  net/core/flow_dissector.c |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
> index 7e64bd8..fbefdca 100644
> --- a/include/net/flow_keys.h
> +++ b/include/net/flow_keys.h
> @@ -1,6 +1,19 @@
>  #ifndef _NET_FLOW_KEYS_H
>  #define _NET_FLOW_KEYS_H
>
> +/* struct flow_keys:
> + *     @src: source ip address in case of IPv4
> + *           For IPv6 it contains 32bit hash of src address
> + *     @dst: destination ip address in case of IPv4
> + *           For IPv6 it contains 32bit hash of dst address
> + *     @ports: port numbers of Transport header
> + *             port16[0]: src port number
> + *             port16[1]: dst port number
> + *     @thoff: Transport header offset
> + *     @n_proto: Network header protocol (eg. IPv4/IPv6)
> + *     @ip_proto: Transport header protocol (eg. TCP/UDP)
> + * All the members, except thoff, are in network byte order.
> + */
>  struct flow_keys {
>         /* (src,dst) must be grouped, in the same way than in IP header */
>         __be32 src;
> @@ -10,6 +23,7 @@ struct flow_keys {
>                 __be16 port16[2];
>         };
>         u16 thoff;
> +       u16 n_proto;
>         u8 ip_proto;
If you add n_proto, have you considered adding nhoff too? That way the
L3 and L4 headers will be completely pinned by the flow dissector.

-Chema


>  };
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 624f985..a3cfb8e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>         unsigned int            pkt_len;
>         u16                     slave_dev_queue_mapping;
>         u16                     _pad;
> -       unsigned char           data[20];
> +       unsigned char           data[24];
>  };
>
>  static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 107ed12..c2b53c1 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -175,6 +175,7 @@ ipv6:
>                 break;
>         }
>
> +       flow->n_proto = proto;
>         flow->ip_proto = ip_proto;
>         flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
>         flow->thoff = (u16) nhoff;
> --
> 2.0.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-19 18:06   ` Chema Gonzalez
@ 2014-06-19 18:52     ` Govindarajulu Varadarajan
  0 siblings, 0 replies; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-19 18:52 UTC (permalink / raw)
  To: Chema Gonzalez
  Cc: Govindarajulu Varadarajan, davem, netdev, ssujith, gvaradar, benve

On Thu, 19 Jun 2014, Chema Gonzalez wrote:
> If you add n_proto, have you considered adding nhoff too? That way the
> L3 and L4 headers will be completely pinned by the flow dissector.
>

Yes, I have actually considered it. But right now there is no one who would need
nhoff. We can add it later when we need it?

Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-09 18:32 ` [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect() Govindarajulu Varadarajan
  2014-06-10 13:11   ` Daniel Borkmann
  2014-06-19 18:06   ` Chema Gonzalez
@ 2014-06-24  5:29   ` Yinghai Lu
  2014-06-26  6:34     ` Govindarajulu Varadarajan
  2 siblings, 1 reply; 36+ messages in thread
From: Yinghai Lu @ 2014-06-24  5:29 UTC (permalink / raw)
  To: Govindarajulu Varadarajan; +Cc: David Miller, NetDev, ssujith, gvaradar, benve

On Mon, Jun 9, 2014 at 11:32 AM, Govindarajulu Varadarajan
<_govind@gmx.com> wrote:
> skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
> give any information about IPv4 or IPv6.
>
> This patch adds new member, n_proto, to struct flow_keys. Which records the
> IP layer type. i.e IPv4 or IPv6.
>
> This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.
>
> Adding new member to flow_keys increases the struct size by around 4 bytes.
> This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
> qdisc_cb_private_validate()
>
> So increase data size by 4
>
> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
> ---
>  include/net/flow_keys.h   | 14 ++++++++++++++
>  include/net/sch_generic.h |  2 +-
>  net/core/flow_dissector.c |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
> index 7e64bd8..fbefdca 100644
> --- a/include/net/flow_keys.h
> +++ b/include/net/flow_keys.h
> @@ -1,6 +1,19 @@
>  #ifndef _NET_FLOW_KEYS_H
>  #define _NET_FLOW_KEYS_H
>
> +/* struct flow_keys:
> + *     @src: source ip address in case of IPv4
> + *           For IPv6 it contains 32bit hash of src address
> + *     @dst: destination ip address in case of IPv4
> + *           For IPv6 it contains 32bit hash of dst address
> + *     @ports: port numbers of Transport header
> + *             port16[0]: src port number
> + *             port16[1]: dst port number
> + *     @thoff: Transport header offset
> + *     @n_proto: Network header protocol (eg. IPv4/IPv6)
> + *     @ip_proto: Transport header protocol (eg. TCP/UDP)
> + * All the members, except thoff, are in network byte order.
> + */
>  struct flow_keys {
>         /* (src,dst) must be grouped, in the same way than in IP header */
>         __be32 src;
> @@ -10,6 +23,7 @@ struct flow_keys {
>                 __be16 port16[2];
>         };
>         u16 thoff;
> +       u16 n_proto;
>         u8 ip_proto;
>  };
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 624f985..a3cfb8e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>         unsigned int            pkt_len;
>         u16                     slave_dev_queue_mapping;
>         u16                     _pad;
> -       unsigned char           data[20];
> +       unsigned char           data[24];
>  };
>
>  static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 107ed12..c2b53c1 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -175,6 +175,7 @@ ipv6:
>                 break;
>         }
>
> +       flow->n_proto = proto;
>         flow->ip_proto = ip_proto;
>         flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
>         flow->thoff = (u16) nhoff;
> --
> 2.0.0
>
this patch in net-next cause kernel crash.

[  148.466045] qlge 0000:4a:00.1 eth27: Passed Get Port Configuration.
[  162.385445] BUG: unable to handle kernel paging request at 000000010000007e
[  162.385839] IP: [<ffffffff81f18899>] __dev_queue_xmit+0x399/0x630
[  162.398541] PGD 0
[  162.398659] Oops: 0002 [#1] SMP
[  162.398845] Modules linked in:
[  162.399022] CPU: 5 PID: 1 Comm: swapper/0 Tainted: G        W
3.16.0-rc2-yh-00302-g3d5dc41-dirty #22
[  162.418490] Hardware name: Oracle Corporation  unknown       /
, BIOS 11016600    05/17/2011
[  162.438851] task: ffff884027a80000 ti: ffff881027d20000 task.ti:
ffff881027d20000
[  162.468329] RIP: 0010:[<ffffffff81f18899>]  [<ffffffff81f18899>]
__dev_queue_xmit+0x399/0x630
[  162.488085] RSP: 0000:ffff881027d23d28  EFLAGS: 00010202
[  162.488345] RAX: 00000000fffffffe RBX: ffff887026041000 RCX: 0000000000000001
[  162.508245] RDX: 0000000000000000 RSI: ffffffff82dfee78 RDI: ffff884027a80000
[  162.508590] RBP: ffff881027d23d70 R08: 0000000000000001 R09: 0000000000000000
[  162.528255] R10: 0000000000000000 R11: ffff885026020800 R12: ffffffff82dfedc0
[  162.547963] R13: ffff881022b7c000 R14: 0000000000000dac R15: ffff88702434c400
[  162.548310] FS:  0000000000000000(0000) GS:ffff88103f000000(0000)
knlGS:0000000000000000
[  162.568186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  162.568489] CR2: 000000010000007e CR3: 0000000002c19000 CR4: 00000000000007e0
[  162.588263] Stack:
[  162.588354]  ffffffff81f18505 0000000000000000 00ff881027d23d80
ffffffff82dfee60
[  162.608269]  ffff885023dd1e40 ffff881022b7c000 ffff885026020929
0000000000000dac
[  162.628043]  ffff887026041000 ffff881027d23d80 ffffffff81f18b40
ffff881027d23e18
[  162.628422] Call Trace:
[  162.647963]  [<ffffffff81f18505>] ? __dev_queue_xmit+0x5/0x630
[  162.648284]  [<ffffffff81f18b40>] dev_queue_xmit+0x10/0x20
[  162.667987]  [<ffffffff83087b64>] ip_auto_config+0x8e6/0xf13
[  162.668282]  [<ffffffff8100031d>] ? do_one_initcall+0xdd/0x1e0
[  162.688018]  [<ffffffff810e36ad>] ? trace_hardirqs_on+0xd/0x10
[  162.688298]  [<ffffffff8110c40f>] ? ktime_get+0xbf/0x140
[  162.708029]  [<ffffffff8308727e>] ? root_nfs_parse_addr+0xbd/0xbd
[  162.708292]  [<ffffffff81000323>] do_one_initcall+0xe3/0x1e0
[  162.728075]  [<ffffffff810ba8bd>] ? parse_args+0x1ed/0x330
[  162.728340]  [<ffffffff8203501a>] ? printk+0x54/0x56
[  162.748027]  [<ffffffff8301f4c5>] kernel_init_freeable+0x237/0x2ce
[  162.748344]  [<ffffffff8301eaf7>] ? do_early_param+0x8a/0x8a
[  162.768070]  [<ffffffff8202abb0>] ? rest_init+0xc0/0xc0
[  162.768318]  [<ffffffff8202abbe>] kernel_init+0xe/0x100
[  162.788057]  [<ffffffff8204c4ac>] ret_from_fork+0x7c/0xb0
[  162.788314]  [<ffffffff8202abb0>] ? rest_init+0xc0/0xc0
[  162.808008] Code: e8 5d 48 18 ff eb 13 48 c7 c7 60 d9 c5 82 e8 cf
bc 1c ff 85 c0 74 dd 0f 1f 00 48 8b 43 58 48 83 e0 fe 48 85 c0 48 89
43 58 74 07 <f0> ff 80 80 00 00 00 4c 89 e6 48 89 df 41 ff 14 24 41 89
c6 41
[  162.828797] RIP  [<ffffffff81f18899>] __dev_queue_xmit+0x399/0x630
[  162.848186]  RSP <ffff881027d23d28>
[  162.848341] CR2: 000000010000007e
[  162.848490] ---[ end trace 26b7736a09036e46 ]---
[  162.868194] Kernel panic - not syncing: Fatal exception in interrupt
[  162.872673] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffff9fffffff)
[  162.888531] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[  162.908306] ------------[ cut here ]------------

After the commit is reverted, the system work again.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-24  5:29   ` Yinghai Lu
@ 2014-06-26  6:34     ` Govindarajulu Varadarajan
  2014-09-18 14:18       ` Or Gerlitz
  0 siblings, 1 reply; 36+ messages in thread
From: Govindarajulu Varadarajan @ 2014-06-26  6:34 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Govindarajulu Varadarajan, David Miller, NetDev, ssujith,
	gvaradar, benve


On Mon, 23 Jun 2014, Yinghai Lu wrote:
> this patch in net-next cause kernel crash.
>
> [  148.466045] qlge 0000:4a:00.1 eth27: Passed Get Port Configuration.
> [  162.385445] BUG: unable to handle kernel paging request at 000000010000007e
> [  162.385839] IP: [<ffffffff81f18899>] __dev_queue_xmit+0x399/0x630
> [  162.398541] PGD 0
> [  162.398659] Oops: 0002 [#1] SMP
> [  162.398845] Modules linked in:
> [  162.399022] CPU: 5 PID: 1 Comm: swapper/0 Tainted: G        W
> 3.16.0-rc2-yh-00302-g3d5dc41-dirty #22
> [  162.418490] Hardware name: Oracle Corporation  unknown       /
> , BIOS 11016600    05/17/2011
> [  162.438851] task: ffff884027a80000 ti: ffff881027d20000 task.ti:
> ffff881027d20000
> [  162.468329] RIP: 0010:[<ffffffff81f18899>]  [<ffffffff81f18899>]
> __dev_queue_xmit+0x399/0x630
> [  162.488085] RSP: 0000:ffff881027d23d28  EFLAGS: 00010202
> [  162.488345] RAX: 00000000fffffffe RBX: ffff887026041000 RCX: 0000000000000001
> [  162.508245] RDX: 0000000000000000 RSI: ffffffff82dfee78 RDI: ffff884027a80000
> [  162.508590] RBP: ffff881027d23d70 R08: 0000000000000001 R09: 0000000000000000
> [  162.528255] R10: 0000000000000000 R11: ffff885026020800 R12: ffffffff82dfedc0
> [  162.547963] R13: ffff881022b7c000 R14: 0000000000000dac R15: ffff88702434c400
> [  162.548310] FS:  0000000000000000(0000) GS:ffff88103f000000(0000)
> knlGS:0000000000000000
> [  162.568186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  162.568489] CR2: 000000010000007e CR3: 0000000002c19000 CR4: 00000000000007e0
> [  162.588263] Stack:
> [  162.588354]  ffffffff81f18505 0000000000000000 00ff881027d23d80
> ffffffff82dfee60
> [  162.608269]  ffff885023dd1e40 ffff881022b7c000 ffff885026020929
> 0000000000000dac
> [  162.628043]  ffff887026041000 ffff881027d23d80 ffffffff81f18b40
> ffff881027d23e18
> [  162.628422] Call Trace:
> [  162.647963]  [<ffffffff81f18505>] ? __dev_queue_xmit+0x5/0x630
> [  162.648284]  [<ffffffff81f18b40>] dev_queue_xmit+0x10/0x20
> [  162.667987]  [<ffffffff83087b64>] ip_auto_config+0x8e6/0xf13
> [  162.668282]  [<ffffffff8100031d>] ? do_one_initcall+0xdd/0x1e0
> [  162.688018]  [<ffffffff810e36ad>] ? trace_hardirqs_on+0xd/0x10
> [  162.688298]  [<ffffffff8110c40f>] ? ktime_get+0xbf/0x140
> [  162.708029]  [<ffffffff8308727e>] ? root_nfs_parse_addr+0xbd/0xbd
> [  162.708292]  [<ffffffff81000323>] do_one_initcall+0xe3/0x1e0
> [  162.728075]  [<ffffffff810ba8bd>] ? parse_args+0x1ed/0x330
> [  162.728340]  [<ffffffff8203501a>] ? printk+0x54/0x56
> [  162.748027]  [<ffffffff8301f4c5>] kernel_init_freeable+0x237/0x2ce
> [  162.748344]  [<ffffffff8301eaf7>] ? do_early_param+0x8a/0x8a
> [  162.768070]  [<ffffffff8202abb0>] ? rest_init+0xc0/0xc0
> [  162.768318]  [<ffffffff8202abbe>] kernel_init+0xe/0x100
> [  162.788057]  [<ffffffff8204c4ac>] ret_from_fork+0x7c/0xb0
> [  162.788314]  [<ffffffff8202abb0>] ? rest_init+0xc0/0xc0
> [  162.808008] Code: e8 5d 48 18 ff eb 13 48 c7 c7 60 d9 c5 82 e8 cf
> bc 1c ff 85 c0 74 dd 0f 1f 00 48 8b 43 58 48 83 e0 fe 48 85 c0 48 89
> 43 58 74 07 <f0> ff 80 80 00 00 00 4c 89 e6 48 89 df 41 ff 14 24 41 89
> c6 41
> [  162.828797] RIP  [<ffffffff81f18899>] __dev_queue_xmit+0x399/0x630
> [  162.848186]  RSP <ffff881027d23d28>
> [  162.848341] CR2: 000000010000007e
> [  162.848490] ---[ end trace 26b7736a09036e46 ]---
> [  162.868194] Kernel panic - not syncing: Fatal exception in interrupt
> [  162.872673] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation
> range: 0xffffffff80000000-0xffffffff9fffffff)
> [  162.888531] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
> [  162.908306] ------------[ cut here ]------------
>
> After the commit is reverted, the system work again.

I do not see any problem in my system.

Did you try disecting what "__dev_queue_xmit+0x399/0x630" is?

On what interface did the crash occur on? is it bond interface?

Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-06-26  6:34     ` Govindarajulu Varadarajan
@ 2014-09-18 14:18       ` Or Gerlitz
  2014-09-18 14:38         ` Eric Dumazet
  2014-09-18 15:02         ` [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes Eric Dumazet
  0 siblings, 2 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-09-18 14:18 UTC (permalink / raw)
  To: Govindarajulu Varadarajan
  Cc: Yinghai Lu, David Miller, NetDev, ssujith, gvaradar,
	Christian Benvenuti (benve)

On Thu, Jun 26, 2014 at 9:34 AM, Govindarajulu Varadarajan
<_govind@gmx.com> wrote:
>
>
> On Mon, 23 Jun 2014, Yinghai Lu wrote:
>>
>> this patch in net-next cause kernel crash.
>>
>> [  148.466045] qlge 0000:4a:00.1 eth27: Passed Get Port Configuration.
>> [  162.385445] BUG: unable to handle kernel paging request at 000000010000007e
>> [  162.385839] IP: [<ffffffff81f18899>] __dev_queue_xmit+0x399/0x630
>> [  162.398541] PGD 0
>> [  162.398659] Oops: 0002 [#1] SMP
>> [  162.398845] Modules linked in:
>> [  162.399022] CPU: 5 PID: 1 Comm: swapper/0 Tainted: G        W
>> 3.16.0-rc2-yh-00302-g3d5dc41-dirty #22
>> [  162.418490] Hardware name: Oracle Corporation  unknown       /
>> , BIOS 11016600    05/17/2011
>> [  162.438851] task: ffff884027a80000 ti: ffff881027d20000 task.ti:
>> ffff881027d20000
>> [  162.468329] RIP: 0010:[<ffffffff81f18899>]  [<ffffffff81f18899>]
>> __dev_queue_xmit+0x399/0x630
>> [  162.488085] RSP: 0000:ffff881027d23d28  EFLAGS: 00010202
>> [  162.488345] RAX: 00000000fffffffe RBX: ffff887026041000 RCX: 0000000000000001
>> [  162.508245] RDX: 0000000000000000 RSI: ffffffff82dfee78 RDI: ffff884027a80000
>> [  162.508590] RBP: ffff881027d23d70 R08: 0000000000000001 R09: 0000000000000000
>> [  162.528255] R10: 0000000000000000 R11: ffff885026020800 R12: ffffffff82dfedc0
>> [  162.547963] R13: ffff881022b7c000 R14: 0000000000000dac R15: ffff88702434c400
>> [  162.548310] FS:  0000000000000000(0000) GS:ffff88103f000000(0000)
>> knlGS:0000000000000000
>> [  162.568186] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [  162.568489] CR2: 000000010000007e CR3: 0000000002c19000 CR4: 00000000000007e0
>> [  162.588263] Stack:
>> [  162.588354]  ffffffff81f18505 0000000000000000 00ff881027d23d80
>> ffffffff82dfee60
>> [  162.608269]  ffff885023dd1e40 ffff881022b7c000 ffff885026020929
>> 0000000000000dac
>> [  162.628043]  ffff887026041000 ffff881027d23d80 ffffffff81f18b40
>> ffff881027d23e18
>> [  162.628422] Call Trace:
>> [  162.647963]  [<ffffffff81f18505>] ? __dev_queue_xmit+0x5/0x630
>> [  162.648284]  [<ffffffff81f18b40>] dev_queue_xmit+0x10/0x20
>> [  162.667987]  [<ffffffff83087b64>] ip_auto_config+0x8e6/0xf13
>> [  162.668282]  [<ffffffff8100031d>] ? do_one_initcall+0xdd/0x1e0
>> [  162.688018]  [<ffffffff810e36ad>] ? trace_hardirqs_on+0xd/0x10
>> [  162.688298]  [<ffffffff8110c40f>] ? ktime_get+0xbf/0x140
>> [  162.708029]  [<ffffffff8308727e>] ? root_nfs_parse_addr+0xbd/0xbd
>> [  162.708292]  [<ffffffff81000323>] do_one_initcall+0xe3/0x1e0
>> [  162.728075]  [<ffffffff810ba8bd>] ? parse_args+0x1ed/0x330
>> [  162.728340]  [<ffffffff8203501a>] ? printk+0x54/0x56
>> [  162.748027]  [<ffffffff8301f4c5>] kernel_init_freeable+0x237/0x2ce
>> [  162.748344]  [<ffffffff8301eaf7>] ? do_early_param+0x8a/0x8a
>> [  162.768070]  [<ffffffff8202abb0>] ? rest_init+0xc0/0xc0
>> [  162.768318]  [<ffffffff8202abbe>] kernel_init+0xe/0x100
>> [  162.788057]  [<ffffffff8204c4ac>] ret_from_fork+0x7c/0xb0
>> [  162.788314]  [<ffffffff8202abb0>] ? rest_init+0xc0/0xc0
>> [  162.808008] Code: e8 5d 48 18 ff eb 13 48 c7 c7 60 d9 c5 82 e8 cf
>> bc 1c ff 85 c0 74 dd 0f 1f 00 48 8b 43 58 48 83 e0 fe 48 85 c0 48 89
>> 43 58 74 07 <f0> ff 80 80 00 00 00 4c 89 e6 48 89 df 41 ff 14 24 41 89
>> c6 41
>> [  162.828797] RIP  [<ffffffff81f18899>] __dev_queue_xmit+0x399/0x630
>> [  162.848186]  RSP <ffff881027d23d28>
>> [  162.848341] CR2: 000000010000007e
>> [  162.848490] ---[ end trace 26b7736a09036e46 ]---
>> [  162.868194] Kernel panic - not syncing: Fatal exception in interrupt
>> [  162.872673] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation
>> range: 0xffffffff80000000-0xffffffff9fffffff)
>> [  162.888531] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
>> [  162.908306] ------------[ cut here ]------------
>>
>> After the commit is reverted, the system work again.
>
>
> I do not see any problem in my system.
>
> Did you try disecting what "__dev_queue_xmit+0x399/0x630" is?
>
> On what interface did the crash occur on? is it bond interface?
>

The crash happens 100% on IPoIB (IP-over-Infiniband) [1] interfaces
b/c your upstream commit e0f31d8 "flow_keys: Record IP layer protocol
in skb_flow_dissect()" causes the IPoIB data stashed on skb->cb [2] to
smash other skb fields.

So your 3.17-rc1 commit introduced a regression to how things work
since kernel 3.2

Can please see how to revert this hunk

-- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -231,7 +231,7 @@ struct qdisc_skb_cb {
        unsigned int            pkt_len;
        u16                     slave_dev_queue_mapping;
        u16                     _pad;
-       unsigned char           data[20];
+       unsigned char           data[24];
 };

thanks,

Or.

[1]  http://marc.info/?l=linux-rdma&m=141029109017035&w=2
[2] see these commits

936d7de3 IPoIB: Stop lying about hard_header_len and use skb->cb to
stash LL addresses
a0417fa3  net: Make qdisc_skb_cb upper size bound explicit

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect()
  2014-09-18 14:18       ` Or Gerlitz
@ 2014-09-18 14:38         ` Eric Dumazet
  2014-09-18 15:02         ` [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes Eric Dumazet
  1 sibling, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2014-09-18 14:38 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Govindarajulu Varadarajan, Yinghai Lu, David Miller, NetDev,
	ssujith, gvaradar, Christian Benvenuti (benve)

On Thu, 2014-09-18 at 17:18 +0300, Or Gerlitz wrote:

> The crash happens 100% on IPoIB (IP-over-Infiniband) [1] interfaces
> b/c your upstream commit e0f31d8 "flow_keys: Record IP layer protocol
> in skb_flow_dissect()" causes the IPoIB data stashed on skb->cb [2] to
> smash other skb fields.
> 
> So your 3.17-rc1 commit introduced a regression to how things work
> since kernel 3.2
> 
> Can please see how to revert this hunk
> 
> -- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>         unsigned int            pkt_len;
>         u16                     slave_dev_queue_mapping;
>         u16                     _pad;
> -       unsigned char           data[20];
> +       unsigned char           data[24];
>  };
> 
> thanks,
> 
> Or.
> 
> [1]  http://marc.info/?l=linux-rdma&m=141029109017035&w=2
> [2] see these commits
> 
> 936d7de3 IPoIB: Stop lying about hard_header_len and use skb->cb to
> stash LL addresses
> a0417fa3  net: Make qdisc_skb_cb upper size bound explicit

I am taking care of this right now guys.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 14:18       ` Or Gerlitz
  2014-09-18 14:38         ` Eric Dumazet
@ 2014-09-18 15:02         ` Eric Dumazet
  2014-09-18 16:26           ` Stephen Hemminger
                             ` (2 more replies)
  1 sibling, 3 replies; 36+ messages in thread
From: Eric Dumazet @ 2014-09-18 15:02 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Govindarajulu Varadarajan, Yinghai Lu, David Miller, NetDev,
	ssujith, gvaradar, Christian Benvenuti (benve)

From: Eric Dumazet <edumazet@google.com>

We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
or increasing skb->cb[] size.

Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
skb_flow_dissect()") broke IPoIB.

Only current offender is sch_choke, and this one do not need an
absolutely precise flow key.

If we store 17 bytes of flow key, its more than enough. (Its the actual
size of flow_keys if it was a packed structure, but we might add new
fields at the end of it later)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
---
 include/net/sch_generic.h |    3 ++-
 net/sched/sch_choke.c     |   18 ++++++++++++++----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a3cfb8ebeb53..620e086c0cbe 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -231,7 +231,8 @@ struct qdisc_skb_cb {
 	unsigned int		pkt_len;
 	u16			slave_dev_queue_mapping;
 	u16			_pad;
-	unsigned char		data[24];
+#define QDISC_CB_PRIV_LEN 20
+	unsigned char		data[QDISC_CB_PRIV_LEN];
 };
 
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index ed30e436128b..fb666d1e4de3 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -133,10 +133,16 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
 	--sch->q.qlen;
 }
 
+/* private part of skb->cb[] that a qdisc is allowed to use
+ * is limited to QDISC_CB_PRIV_LEN bytes.
+ * As a flow key might be too large, we store a part of it only.
+ */
+#define CHOKE_K_LEN min_t(u32, sizeof(struct flow_keys), QDISC_CB_PRIV_LEN - 3)
+
 struct choke_skb_cb {
 	u16			classid;
 	u8			keys_valid;
-	struct flow_keys	keys;
+	u8			keys[QDISC_CB_PRIV_LEN - 3];
 };
 
 static inline struct choke_skb_cb *choke_skb_cb(const struct sk_buff *skb)
@@ -163,22 +169,26 @@ static u16 choke_get_classid(const struct sk_buff *skb)
 static bool choke_match_flow(struct sk_buff *skb1,
 			     struct sk_buff *skb2)
 {
+	struct flow_keys temp;
+
 	if (skb1->protocol != skb2->protocol)
 		return false;
 
 	if (!choke_skb_cb(skb1)->keys_valid) {
 		choke_skb_cb(skb1)->keys_valid = 1;
-		skb_flow_dissect(skb1, &choke_skb_cb(skb1)->keys);
+		skb_flow_dissect(skb1, &temp);
+		memcpy(&choke_skb_cb(skb1)->keys, &temp, CHOKE_K_LEN);
 	}
 
 	if (!choke_skb_cb(skb2)->keys_valid) {
 		choke_skb_cb(skb2)->keys_valid = 1;
-		skb_flow_dissect(skb2, &choke_skb_cb(skb2)->keys);
+		skb_flow_dissect(skb2, &temp);
+		memcpy(&choke_skb_cb(skb2)->keys, &temp, CHOKE_K_LEN);
 	}
 
 	return !memcmp(&choke_skb_cb(skb1)->keys,
 		       &choke_skb_cb(skb2)->keys,
-		       sizeof(struct flow_keys));
+		       CHOKE_K_LEN);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 15:02         ` [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes Eric Dumazet
@ 2014-09-18 16:26           ` Stephen Hemminger
  2014-09-18 16:32             ` Eric Dumazet
  2014-09-18 22:29           ` [net] " Doug Ledford
  2014-09-22 18:22           ` [PATCH net] " David Miller
  2 siblings, 1 reply; 36+ messages in thread
From: Stephen Hemminger @ 2014-09-18 16:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, Govindarajulu Varadarajan, Yinghai Lu, David Miller,
	NetDev, ssujith, gvaradar, Christian Benvenuti (benve)

On Thu, 18 Sep 2014 08:02:05 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
> or increasing skb->cb[] size.
> 
> Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
> skb_flow_dissect()") broke IPoIB.
> 
> Only current offender is sch_choke, and this one do not need an
> absolutely precise flow key.
> 
> If we store 17 bytes of flow key, its more than enough. (Its the actual
> size of flow_keys if it was a packed structure, but we might add new
> fields at the end of it later)
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Can we add BUILD_BUG to stop next time something smacks this.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 16:26           ` Stephen Hemminger
@ 2014-09-18 16:32             ` Eric Dumazet
  2014-09-18 18:00               ` Eric Dumazet
  2014-09-18 21:28               ` Or Gerlitz
  0 siblings, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2014-09-18 16:32 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Or Gerlitz, Govindarajulu Varadarajan, Yinghai Lu, David Miller,
	NetDev, ssujith, gvaradar, Christian Benvenuti (benve)

On Thu, 2014-09-18 at 09:26 -0700, Stephen Hemminger wrote:
> On Thu, 18 Sep 2014 08:02:05 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
> > or increasing skb->cb[] size.
> > 
> > Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
> > skb_flow_dissect()") broke IPoIB.
> > 
> > Only current offender is sch_choke, and this one do not need an
> > absolutely precise flow key.
> > 
> > If we store 17 bytes of flow key, its more than enough. (Its the actual
> > size of flow_keys if it was a packed structure, but we might add new
> > fields at the end of it later)
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Can we add BUILD_BUG to stop next time something smacks this.

I though we had.

Maybe IPoIB lacks one.

Or, do you have an idea ?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 16:32             ` Eric Dumazet
@ 2014-09-18 18:00               ` Eric Dumazet
  2014-09-18 18:07                 ` Joe Perches
  2014-09-18 21:28               ` Or Gerlitz
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-09-18 18:00 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Or Gerlitz, Govindarajulu Varadarajan, Yinghai Lu, David Miller,
	NetDev, ssujith, gvaradar, Christian Benvenuti (benve)

On Thu, 2014-09-18 at 09:32 -0700, Eric Dumazet wrote:
> On Thu, 2014-09-18 at 09:26 -0700, Stephen Hemminger wrote:
> > On Thu, 18 Sep 2014 08:02:05 -0700
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
> > > or increasing skb->cb[] size.
> > > 
> > > Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
> > > skb_flow_dissect()") broke IPoIB.
> > > 
> > > Only current offender is sch_choke, and this one do not need an
> > > absolutely precise flow key.
> > > 
> > > If we store 17 bytes of flow key, its more than enough. (Its the actual
> > > size of flow_keys if it was a packed structure, but we might add new
> > > fields at the end of it later)
> > > 
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > 
> > Can we add BUILD_BUG to stop next time something smacks this.
> 
> I though we had.
> 
> Maybe IPoIB lacks one.
> 
> Or, do you have an idea ?

Seems straightforward ...

Or can you carry this fix for me ?

Thanks

[PATCH] ipoib: validate struct ipoib_cb size

To catch future errors sooner.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 3edce617c31b..d7562beb5423 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -131,6 +131,12 @@ struct ipoib_cb {
 	u8			hwaddr[INFINIBAND_ALEN];
 };
 
+static inline struct ipoib_cb *ipoib_skb_cb(const struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
+	return (struct ipoib_cb *)skb->cb;
+}
+
 /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */
 struct ipoib_mcast {
 	struct ib_sa_mcmember_rec mcmember;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 1310acf6bf92..13e6e0431592 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -716,7 +716,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_neigh *neigh;
-	struct ipoib_cb *cb = (struct ipoib_cb *) skb->cb;
+	struct ipoib_cb *cb = ipoib_skb_cb(skb);
 	struct ipoib_header *header;
 	unsigned long flags;
 
@@ -813,7 +813,7 @@ static int ipoib_hard_header(struct sk_buff *skb,
 			     const void *daddr, const void *saddr, unsigned len)
 {
 	struct ipoib_header *header;
-	struct ipoib_cb *cb = (struct ipoib_cb *) skb->cb;
+	struct ipoib_cb *cb = ipoib_skb_cb(skb);
 
 	header = (struct ipoib_header *) skb_push(skb, sizeof *header);
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 18:00               ` Eric Dumazet
@ 2014-09-18 18:07                 ` Joe Perches
  2014-09-18 19:14                   ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Perches @ 2014-09-18 18:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Or Gerlitz, Govindarajulu Varadarajan,
	Yinghai Lu, David Miller, NetDev, ssujith, gvaradar,
	Christian Benvenuti (benve)

On Thu, 2014-09-18 at 11:00 -0700, Eric Dumazet wrote:
> On Thu, 2014-09-18 at 09:32 -0700, Eric Dumazet wrote:
> > On Thu, 2014-09-18 at 09:26 -0700, Stephen Hemminger wrote:
[]
> > Or, do you have an idea ?
> 
> Seems straightforward ...
> 
> Or can you carry this fix for me ?
> 
> Thanks
> 
> [PATCH] ipoib: validate struct ipoib_cb size
> 
> To catch future errors sooner.
[]
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
[]
> +static inline struct ipoib_cb *ipoib_skb_cb(const struct sk_buff *skb)
> +{
> +	BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
> +	return (struct ipoib_cb *)skb->cb;
> +}

It seems better not to use const for the struct sk_buff * here.

Neither of the uses take a const struct sk_buff *

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c

> @@ -716,7 +716,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
>  	struct ipoib_neigh *neigh;
> -	struct ipoib_cb *cb = (struct ipoib_cb *) skb->cb;
> +	struct ipoib_cb *cb = ipoib_skb_cb(skb);
>  	struct ipoib_header *header;
>  	unsigned long flags;
>  
> @@ -813,7 +813,7 @@ static int ipoib_hard_header(struct sk_buff *skb,
>  			     const void *daddr, const void *saddr, unsigned len)
>  {
>  	struct ipoib_header *header;
> -	struct ipoib_cb *cb = (struct ipoib_cb *) skb->cb;
> +	struct ipoib_cb *cb = ipoib_skb_cb(skb);
>  
>  	header = (struct ipoib_header *) skb_push(skb, sizeof *header);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 18:07                 ` Joe Perches
@ 2014-09-18 19:14                   ` Eric Dumazet
  2014-09-18 19:31                     ` Joe Perches
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-09-18 19:14 UTC (permalink / raw)
  To: Joe Perches
  Cc: Stephen Hemminger, Or Gerlitz, Govindarajulu Varadarajan,
	Yinghai Lu, David Miller, NetDev, ssujith, gvaradar,
	Christian Benvenuti (benve)

On Thu, 2014-09-18 at 11:07 -0700, Joe Perches wrote:

> > +static inline struct ipoib_cb *ipoib_skb_cb(const struct sk_buff *skb)
> > +{
> > +	BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
> > +	return (struct ipoib_cb *)skb->cb;
> > +}
> 
> It seems better not to use const for the struct sk_buff * here.
> 
> Neither of the uses take a const struct sk_buff *

Thats pretty standard, check for other similar constructs like that.


static inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb)
{
        return (struct qdisc_skb_cb *)skb->cb;
}

This allows uses of the helper when the skb is only read (has the const qual)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 19:14                   ` Eric Dumazet
@ 2014-09-18 19:31                     ` Joe Perches
  2014-09-18 20:31                       ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Perches @ 2014-09-18 19:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Or Gerlitz, Govindarajulu Varadarajan,
	Yinghai Lu, David Miller, NetDev, ssujith, gvaradar,
	Christian Benvenuti (benve)

On Thu, 2014-09-18 at 12:14 -0700, Eric Dumazet wrote:
> On Thu, 2014-09-18 at 11:07 -0700, Joe Perches wrote:
> 
> > > +static inline struct ipoib_cb *ipoib_skb_cb(const struct sk_buff *skb)
> > > +{
> > > +	BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
> > > +	return (struct ipoib_cb *)skb->cb;
> > > +}
> > 
> > It seems better not to use const for the struct sk_buff * here.
> > 
> > Neither of the uses take a const struct sk_buff *
> 
> Thats pretty standard, check for other similar constructs like that.
> 
> 
> static inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb)
> {
>         return (struct qdisc_skb_cb *)skb->cb;
> }
> 
> This allows uses of the helper when the skb is only read (has the const qual)

I don't mind the const argument, but casting
the return to non-const seems like poor form.

btw: it seems like it's 8:5 non-const to const

$ grep -rP --include=*.[ch] -n "cb\s*\*\s*\w+\s*\(\s*(?:const\s+)?struct\s+sk_buff\s*\*\s*\w+\s*\)" *
drivers/net/ethernet/freescale/gianfar.c:2091:static inline struct txfcb *gfar_add_fcb(struct sk_buff *skb)
drivers/net/wireless/ath/ath10k/core.h:83:static inline struct ath10k_skb_cb *ATH10K_SKB_CB(struct sk_buff *skb)
include/net/mrp.h:38:static inline struct mrp_skb_cb *mrp_cb(struct sk_buff *skb)
include/net/ieee802154_netdev.h:235:static inline struct ieee802154_mac_cb *mac_cb(struct sk_buff *skb)
include/net/ieee802154_netdev.h:240:static inline struct ieee802154_mac_cb *mac_cb_init(struct sk_buff *skb)
include/net/sch_generic.h:251:static inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb)
include/net/codel.h:95:static struct codel_skb_cb *get_codel_cb(const struct sk_buff *skb)
include/net/garp.h:36:static inline struct garp_skb_cb *garp_cb(struct sk_buff *skb)
net/core/dev.c:2177:static struct dev_kfree_skb_cb *get_kfree_skb_cb(const struct sk_buff *skb)
net/sched/sch_choke.c:142:static inline struct choke_skb_cb *choke_skb_cb(const struct sk_buff *skb)
net/sched/sch_sfb.c:95:static inline struct sfb_skb_cb *sfb_skb_cb(const struct sk_buff *skb)
net/sched/sch_netem.c:171:static inline struct netem_skb_cb *netem_skb_cb(struct sk_buff *skb)
net/sched/sch_sfq.c:167:static inline struct sfq_skb_cb *sfq_skb_cb(const struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 19:31                     ` Joe Perches
@ 2014-09-18 20:31                       ` Eric Dumazet
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2014-09-18 20:31 UTC (permalink / raw)
  To: Joe Perches
  Cc: Stephen Hemminger, Or Gerlitz, Govindarajulu Varadarajan,
	Yinghai Lu, David Miller, NetDev, ssujith, gvaradar,
	Christian Benvenuti (benve)

On Thu, 2014-09-18 at 12:31 -0700, Joe Perches wrote:

> I don't mind the const argument, but casting
> the return to non-const seems like poor form.

This is the current way to do this, even if we do not like it.

C does not allow to have the same function name for different
parameters, there is nothing we can do about it at this moment.

->

static inline const struct ipoib_cb *ipoib_skb_cb(const struct sk_buff *skb)

static inline struct ipoib_cb *ipoib_skb_cb(struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 16:32             ` Eric Dumazet
  2014-09-18 18:00               ` Eric Dumazet
@ 2014-09-18 21:28               ` Or Gerlitz
  1 sibling, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-09-18 21:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Govindarajulu Varadarajan, Yinghai Lu,
	David Miller, NetDev, ssujith, gvaradar,
	Christian Benvenuti (benve)

On Thu, Sep 18, 2014 at 7:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2014-09-18 at 09:26 -0700, Stephen Hemminger wrote:
>> On Thu, 18 Sep 2014 08:02:05 -0700
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> > From: Eric Dumazet <edumazet@google.com>
>> >
>> > We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
>> > or increasing skb->cb[] size.
>> >
>> > Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
>> > skb_flow_dissect()") broke IPoIB.
>> >
>> > Only current offender is sch_choke, and this one do not need an
>> > absolutely precise flow key.
>> >
>> > If we store 17 bytes of flow key, its more than enough. (Its the actual
>> > size of flow_keys if it was a packed structure, but we might add new
>> > fields at the end of it later)
>> >
>> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>>
>> Can we add BUILD_BUG to stop next time something smacks this.
>
> I though we had.
>
> Maybe IPoIB lacks one.
>
> Or, do you have an idea ?

Nope, we currently don't have such build time check, I saw you posted
one and I will be able to test it Sunday.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 15:02         ` [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes Eric Dumazet
  2014-09-18 16:26           ` Stephen Hemminger
@ 2014-09-18 22:29           ` Doug Ledford
  2014-09-19 12:07             ` Or Gerlitz
  2014-09-22 18:38             ` David Miller
  2014-09-22 18:22           ` [PATCH net] " David Miller
  2 siblings, 2 replies; 36+ messages in thread
From: Doug Ledford @ 2014-09-18 22:29 UTC (permalink / raw)
  To: Eric Dumazet, Or Gerlitz
  Cc: Govindarajulu Varadarajan, Yinghai Lu, David Miller, NetDev,
	ssujith, gvaradar, Christian Benvenuti (benve)

On 09/18/2014 11:02 AM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
> or increasing skb->cb[] size.
>
> Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
> skb_flow_dissect()") broke IPoIB.
>
> Only current offender is sch_choke, and this one do not need an
> absolutely precise flow key.
>
> If we store 17 bytes of flow key, its more than enough. (Its the actual
> size of flow_keys if it was a packed structure, but we might add new
> fields at the end of it later)
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")

I've installed this patch on my cluster and it resolves the problem.

Tested-by/Acked-by: Doug Ledford <dledford@redhat.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 22:29           ` [net] " Doug Ledford
@ 2014-09-19 12:07             ` Or Gerlitz
  2014-09-22 18:38             ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-09-19 12:07 UTC (permalink / raw)
  To: Doug Ledford, David Miller
  Cc: Eric Dumazet, Govindarajulu Varadarajan, Yinghai Lu, NetDev,
	ssujith, gvaradar, Christian Benvenuti (benve)

On Fri, Sep 19, 2014 at 1:29 AM, Doug Ledford <dledford@xsintricity.com> wrote:
> On 09/18/2014 11:02 AM, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>> We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
>> or increasing skb->cb[] size.
>>
>> Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
>> skb_flow_dissect()") broke IPoIB.
>>
>> Only current offender is sch_choke, and this one do not need an
>> absolutely precise flow key.
>>
>> If we store 17 bytes of flow key, its more than enough. (Its the actual
>> size of flow_keys if it was a packed structure, but we might add new
>> fields at the end of it later)
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in
>> skb_flow_dissect()")

> I've installed this patch on my cluster and it resolves the problem.
> Tested-by/Acked-by: Doug Ledford <dledford@redhat.com>

Thanks Eric/Doug!

Dave - just to make sure, this is for net, as the regression was
introduced in 3.17-rc1.

Or.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 15:02         ` [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes Eric Dumazet
  2014-09-18 16:26           ` Stephen Hemminger
  2014-09-18 22:29           ` [net] " Doug Ledford
@ 2014-09-22 18:22           ` David Miller
  2 siblings, 0 replies; 36+ messages in thread
From: David Miller @ 2014-09-22 18:22 UTC (permalink / raw)
  To: eric.dumazet
  Cc: or.gerlitz, _govind, yinghai, netdev, ssujith, gvaradar, benve

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 18 Sep 2014 08:02:05 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
> or increasing skb->cb[] size.
> 
> Commit e0f31d849867 ("flow_keys: Record IP layer protocol in
> skb_flow_dissect()") broke IPoIB.
> 
> Only current offender is sch_choke, and this one do not need an
> absolutely precise flow key.
> 
> If we store 17 bytes of flow key, its more than enough. (Its the actual
> size of flow_keys if it was a packed structure, but we might add new
> fields at the end of it later)
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net] net: sched: shrink struct qdisc_skb_cb to 28 bytes
  2014-09-18 22:29           ` [net] " Doug Ledford
  2014-09-19 12:07             ` Or Gerlitz
@ 2014-09-22 18:38             ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2014-09-22 18:38 UTC (permalink / raw)
  To: dledford
  Cc: eric.dumazet, or.gerlitz, _govind, yinghai, netdev, ssujith,
	gvaradar, benve

From: Doug Ledford <dledford@xsintricity.com>
Date: Thu, 18 Sep 2014 18:29:31 -0400

> Tested-by/Acked-by: Doug Ledford <dledford@redhat.com>

Automated tools do not understand this, please explicitly give
separate Tested-by: and Acked-by: tags in the future.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-09-22 18:38 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-09 18:32 [PATCH net-next 0/8] enic updates Govindarajulu Varadarajan
2014-06-09 18:32 ` [PATCH net-next 1/8] flow_keys: Record IP layer protocol in skb_flow_dissect() Govindarajulu Varadarajan
2014-06-10 13:11   ` Daniel Borkmann
2014-06-10 14:08     ` Govindarajulu Varadarajan
2014-06-10 14:26       ` Eric Dumazet
2014-06-11 22:06         ` David Miller
2014-06-11 22:08           ` David Miller
2014-06-12  5:38             ` Govindarajulu Varadarajan
2014-06-19 18:06   ` Chema Gonzalez
2014-06-19 18:52     ` Govindarajulu Varadarajan
2014-06-24  5:29   ` Yinghai Lu
2014-06-26  6:34     ` Govindarajulu Varadarajan
2014-09-18 14:18       ` Or Gerlitz
2014-09-18 14:38         ` Eric Dumazet
2014-09-18 15:02         ` [PATCH net] net: sched: shrink struct qdisc_skb_cb to 28 bytes Eric Dumazet
2014-09-18 16:26           ` Stephen Hemminger
2014-09-18 16:32             ` Eric Dumazet
2014-09-18 18:00               ` Eric Dumazet
2014-09-18 18:07                 ` Joe Perches
2014-09-18 19:14                   ` Eric Dumazet
2014-09-18 19:31                     ` Joe Perches
2014-09-18 20:31                       ` Eric Dumazet
2014-09-18 21:28               ` Or Gerlitz
2014-09-18 22:29           ` [net] " Doug Ledford
2014-09-19 12:07             ` Or Gerlitz
2014-09-22 18:38             ` David Miller
2014-09-22 18:22           ` [PATCH net] " David Miller
2014-06-09 18:32 ` [PATCH net-next 2/8] enic: fix return value in _vnic_dev_cmd Govindarajulu Varadarajan
2014-06-09 18:32 ` [PATCH net-next 3/8] enic: devcmd for adding IP 5 tuple hardware filters Govindarajulu Varadarajan
2014-06-09 18:32 ` [PATCH net-next 4/8] enic: alloc/free rx_cpu_rmap Govindarajulu Varadarajan
2014-06-09 18:44   ` Sergei Shtylyov
2014-06-10 13:52     ` Govindarajulu Varadarajan
2014-06-10 15:00       ` Sergei Shtylyov
2014-06-09 18:32 ` [PATCH net-next 5/8] enic: Add Accelerated RFS support Govindarajulu Varadarajan
2014-06-09 18:32 ` [PATCH net-next 6/8] enic: fix lockdep around devcmd_lock Govindarajulu Varadarajan
2014-06-09 18:32 ` [PATCH net-next 7/8] enic: add low latency socket busy_poll support Govindarajulu Varadarajan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.