All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] net: Strengthen TX and RX hashes
@ 2015-02-28  3:11 Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 1/6] net: initialize sk_txhash using random value Tom Herbert
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Tom Herbert @ 2015-02-28  3:11 UTC (permalink / raw)
  To: davem, netdev

Both the TX and RX hashes associated with skb->hash are mostly 4-tuple
hashes. This is potentially problematic. For instance one could
concieve of a DOS attack against a TCP connection using UDP with same
4-tuple.  Most switches already use at least a 5-tuple hash, NICs
probably should also (for instance Toeplitz hash for UDP could include
UDP protocol number). This patch set implements 5-tuple hashes in the
stack.

For transmit, we simply initialize sk_hash to a random value instead of
performing any actual hash computation. A random value should provide
full 32-bits of entropy and ensure that connected sockets have a
different hash value regardless of protocol, 4-tuple, network namespace,
etc. sock_init_txhash was created to provide this.  This also can be
called for any family not just IP.

For receive, we add a new field in flow_keys called hash_extra. When
doing the jhash this value is XORed into the destination address field.
In flow_dissector, the IP protocol is placed in this field so that we
get a 5-tuple hash.

Also, in flow_dissector, we add support for including VLAN ID and GRE
keyid in the hash (via hash_extra). We implement support for extracting
a hash from MPLS entropy labels (RFC 6790).

Tom Herbert (6):
  net: initialize sk_txhash using random value
  flow_dissector: Include ip_proto in hash computation
  flow_dissector: Add hash_extra field to flow_keys struct
  flow_dissector: Include VLAN ID in hash computation
  flow_dissector: Include GRE keyid in hash computation
  flow_dissector: Include MPLS entropy label in hash computation

 include/net/flow_keys.h   |  1 +
 include/net/ip.h          | 13 -------------
 include/net/ipv6.h        | 14 --------------
 include/net/sch_generic.h |  2 +-
 include/net/sock.h        |  6 ++++++
 include/uapi/linux/in.h   |  2 ++
 include/uapi/linux/mpls.h |  3 +++
 net/core/flow_dissector.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
 net/ipv4/datagram.c       |  2 +-
 net/ipv4/tcp_ipv4.c       |  4 ++--
 net/ipv6/datagram.c       |  2 +-
 net/ipv6/tcp_ipv6.c       |  4 ++--
 12 files changed, 61 insertions(+), 36 deletions(-)

-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 1/6] net: initialize sk_txhash using random value
  2015-02-28  3:11 [PATCH net-next 0/6] net: Strengthen TX and RX hashes Tom Herbert
@ 2015-02-28  3:11 ` Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 2/6] flow_dissector: Include ip_proto in hash computation Tom Herbert
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Tom Herbert @ 2015-02-28  3:11 UTC (permalink / raw)
  To: davem, netdev

This patch creates sock_init_txhash which simply intializes sk_txhash
to a random value. This satisfies the entropy requirements of the
hash and is stronger than old method of doing flow_hash_from_keys.
This function can be called for any type of connected socket, and will
generate different hash values for same 4-tuple across different
IP protocols or namespaces.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/ip.h    | 13 -------------
 include/net/ipv6.h  | 14 --------------
 include/net/sock.h  |  6 ++++++
 net/ipv4/datagram.c |  2 +-
 net/ipv4/tcp_ipv4.c |  4 ++--
 net/ipv6/datagram.c |  2 +-
 net/ipv6/tcp_ipv6.c |  4 ++--
 7 files changed, 12 insertions(+), 33 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 025c61c..11991ed 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -352,19 +352,6 @@ static inline __wsum inet_compute_pseudo(struct sk_buff *skb, int proto)
 				  skb->len, proto, 0);
 }
 
-static inline void inet_set_txhash(struct sock *sk)
-{
-	struct inet_sock *inet = inet_sk(sk);
-	struct flow_keys keys;
-
-	keys.src = inet->inet_saddr;
-	keys.dst = inet->inet_daddr;
-	keys.port16[0] = inet->inet_sport;
-	keys.port16[1] = inet->inet_dport;
-
-	sk->sk_txhash = flow_hash_from_keys(&keys);
-}
-
 static inline __wsum inet_gro_compute_pseudo(struct sk_buff *skb, int proto)
 {
 	const struct iphdr *iph = skb_gro_network_header(skb);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index b767306..3201815 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -691,20 +691,6 @@ static inline int ip6_sk_dst_hoplimit(struct ipv6_pinfo *np, struct flowi6 *fl6,
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static inline void ip6_set_txhash(struct sock *sk)
-{
-	struct inet_sock *inet = inet_sk(sk);
-	struct ipv6_pinfo *np = inet6_sk(sk);
-	struct flow_keys keys;
-
-	keys.src = (__force __be32)ipv6_addr_hash(&np->saddr);
-	keys.dst = (__force __be32)ipv6_addr_hash(&sk->sk_v6_daddr);
-	keys.port16[0] = inet->inet_sport;
-	keys.port16[1] = inet->inet_dport;
-
-	sk->sk_txhash = flow_hash_from_keys(&keys);
-}
-
 static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb,
 					__be32 flowlabel, bool autolabel)
 {
diff --git a/include/net/sock.h b/include/net/sock.h
index ab186b1..0f03241 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -56,6 +56,7 @@
 #include <linux/uaccess.h>
 #include <linux/page_counter.h>
 #include <linux/memcontrol.h>
+#include <linux/random.h>
 #include <linux/static_key.h>
 #include <linux/aio.h>
 #include <linux/sched.h>
@@ -880,6 +881,11 @@ static inline void sock_rps_reset_rxhash(struct sock *sk)
 #endif
 }
 
+static inline void sock_init_txhash(struct sock *sk)
+{
+	sk->sk_txhash = prandom_u32();
+}
+
 #define sk_wait_event(__sk, __timeo, __condition)			\
 	({	int __rc;						\
 		release_sock(__sk);					\
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 90c0e83..7a09672 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -76,7 +76,7 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	inet->inet_daddr = fl4->daddr;
 	inet->inet_dport = usin->sin_port;
 	sk->sk_state = TCP_ESTABLISHED;
-	inet_set_txhash(sk);
+	sock_init_txhash(sk);
 	inet->inet_id = jiffies;
 
 	sk_dst_set(sk, &rt->dst);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5a2dfed..2c2f141 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -222,7 +222,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (err)
 		goto failure;
 
-	inet_set_txhash(sk);
+	sock_init_txhash(sk);
 
 	rt = ip_route_newports(fl4, rt, orig_sport, orig_dport,
 			       inet->inet_sport, inet->inet_dport, sk);
@@ -1328,7 +1328,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 	newinet->mc_ttl	      = ip_hdr(skb)->ttl;
 	newinet->rcv_tos      = ip_hdr(skb)->tos;
 	inet_csk(newsk)->icsk_ext_hdr_len = 0;
-	inet_set_txhash(newsk);
+	sock_init_txhash(newsk);
 	if (inet_opt)
 		inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
 	newinet->inet_id = newtp->write_seq ^ jiffies;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index c215be7..fc5d3f0 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -199,7 +199,7 @@ ipv4_connected:
 		      NULL);
 
 	sk->sk_state = TCP_ESTABLISHED;
-	ip6_set_txhash(sk);
+	sock_init_txhash(sk);
 out:
 	fl6_sock_release(flowlabel);
 	return err;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5d46832..ef7e968 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -295,7 +295,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	if (err)
 		goto late_failure;
 
-	ip6_set_txhash(sk);
+	sock_init_txhash(sk);
 
 	if (!tp->write_seq && likely(!tp->repair))
 		tp->write_seq = secure_tcpv6_sequence_number(np->saddr.s6_addr32,
@@ -1155,7 +1155,7 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 	newsk->sk_v6_rcv_saddr = ireq->ir_v6_loc_addr;
 	newsk->sk_bound_dev_if = ireq->ir_iif;
 
-	ip6_set_txhash(newsk);
+	sock_init_txhash(newsk);
 
 	/* Now IPv6 options...
 
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 2/6] flow_dissector: Include ip_proto in hash computation
  2015-02-28  3:11 [PATCH net-next 0/6] net: Strengthen TX and RX hashes Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 1/6] net: initialize sk_txhash using random value Tom Herbert
@ 2015-02-28  3:11 ` Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct Tom Herbert
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Tom Herbert @ 2015-02-28  3:11 UTC (permalink / raw)
  To: davem, netdev

Fold ip_proto into dst when computing hash. This provides a more
standard 5-tuple hash.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/flow_dissector.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 2c35c02..f73a248 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -276,6 +276,7 @@ static __always_inline u32 __flow_hash_3words(u32 a, u32 b, u32 c)
 static inline u32 __flow_hash_from_keys(struct flow_keys *keys)
 {
 	u32 hash;
+	u32 extra;
 
 	/* get a consistent hash (same value on both flow directions) */
 	if (((__force u32)keys->dst < (__force u32)keys->src) ||
@@ -285,7 +286,8 @@ static inline u32 __flow_hash_from_keys(struct flow_keys *keys)
 		swap(keys->port16[0], keys->port16[1]);
 	}
 
-	hash = __flow_hash_3words((__force u32)keys->dst,
+	extra = keys->ip_proto;
+	hash = __flow_hash_3words((__force u32)keys->dst ^ extra,
 				  (__force u32)keys->src,
 				  (__force u32)keys->ports);
 	if (!hash)
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-02-28  3:11 [PATCH net-next 0/6] net: Strengthen TX and RX hashes Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 1/6] net: initialize sk_txhash using random value Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 2/6] flow_dissector: Include ip_proto in hash computation Tom Herbert
@ 2015-02-28  3:11 ` Tom Herbert
  2015-02-28  7:37   ` Eric Dumazet
  2015-02-28  3:11 ` [PATCH net-next 4/6] flow_dissector: Include VLAN ID in hash computation Tom Herbert
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2015-02-28  3:11 UTC (permalink / raw)
  To: davem, netdev

This will be used for additional input into the hash computation
such as VLAN ID or GRE keyid.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/flow_keys.h   | 1 +
 include/net/sch_generic.h | 2 +-
 net/core/flow_dissector.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index dc8fd81..d498ee8 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -24,6 +24,7 @@ struct flow_keys {
 	};
 	u16	thoff;
 	__be16	n_proto;
+	__be32	hash_extra;
 	u8	ip_proto;
 };
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c605d30..d41a034 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -252,7 +252,7 @@ struct qdisc_skb_cb {
 	unsigned int		pkt_len;
 	u16			slave_dev_queue_mapping;
 	u16			_pad;
-#define QDISC_CB_PRIV_LEN 20
+#define QDISC_CB_PRIV_LEN 24
 	unsigned char		data[QDISC_CB_PRIV_LEN];
 };
 
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index f73a248..84bb794 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -286,7 +286,7 @@ static inline u32 __flow_hash_from_keys(struct flow_keys *keys)
 		swap(keys->port16[0], keys->port16[1]);
 	}
 
-	extra = keys->ip_proto;
+	extra = keys->ip_proto ^ keys->hash_extra;
 	hash = __flow_hash_3words((__force u32)keys->dst ^ extra,
 				  (__force u32)keys->src,
 				  (__force u32)keys->ports);
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 4/6] flow_dissector: Include VLAN ID in hash computation
  2015-02-28  3:11 [PATCH net-next 0/6] net: Strengthen TX and RX hashes Tom Herbert
                   ` (2 preceding siblings ...)
  2015-02-28  3:11 ` [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct Tom Herbert
@ 2015-02-28  3:11 ` Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 5/6] flow_dissector: Include GRE keyid " Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 6/6] flow_dissector: Include MPLS entropy label " Tom Herbert
  5 siblings, 0 replies; 18+ messages in thread
From: Tom Herbert @ 2015-02-28  3:11 UTC (permalink / raw)
  To: davem, netdev

If VLAN is present include that in hash_extra in flow_keys when
dissecting. This will then be included in hash computation.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/flow_dissector.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 84bb794..b11eec1 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -155,6 +155,7 @@ ipv6:
 		if (!vlan)
 			return false;
 
+		flow->hash_extra ^= vlan->h_vlan_TCI;
 		proto = vlan->h_vlan_encapsulated_proto;
 		nhoff += sizeof(*vlan);
 		goto again;
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 5/6] flow_dissector: Include GRE keyid in hash computation
  2015-02-28  3:11 [PATCH net-next 0/6] net: Strengthen TX and RX hashes Tom Herbert
                   ` (3 preceding siblings ...)
  2015-02-28  3:11 ` [PATCH net-next 4/6] flow_dissector: Include VLAN ID in hash computation Tom Herbert
@ 2015-02-28  3:11 ` Tom Herbert
  2015-02-28  3:11 ` [PATCH net-next 6/6] flow_dissector: Include MPLS entropy label " Tom Herbert
  5 siblings, 0 replies; 18+ messages in thread
From: Tom Herbert @ 2015-02-28  3:11 UTC (permalink / raw)
  To: davem, netdev

If keyid is present include that in hash_extra in flow_keys when
dissecting. This will then be included in hash computation.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/core/flow_dissector.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index b11eec1..e04946f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -219,8 +219,21 @@ ipv6:
 			nhoff += 4;
 			if (hdr->flags & GRE_CSUM)
 				nhoff += 4;
-			if (hdr->flags & GRE_KEY)
+			if (hdr->flags & GRE_KEY) {
+				const __be32 *keyid;
+				__be32 _keyid;
+
+				keyid = __skb_header_pointer(skb, nhoff,
+							     sizeof(_keyid),
+							     data, hlen,
+							     &_keyid);
+				if (!keyid)
+					return false;
+
+				flow->hash_extra ^= *keyid;
+
 				nhoff += 4;
+			}
 			if (hdr->flags & GRE_SEQ)
 				nhoff += 4;
 			if (proto == htons(ETH_P_TEB)) {
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 6/6] flow_dissector: Include MPLS entropy label in hash computation
  2015-02-28  3:11 [PATCH net-next 0/6] net: Strengthen TX and RX hashes Tom Herbert
                   ` (4 preceding siblings ...)
  2015-02-28  3:11 ` [PATCH net-next 5/6] flow_dissector: Include GRE keyid " Tom Herbert
@ 2015-02-28  3:11 ` Tom Herbert
  5 siblings, 0 replies; 18+ messages in thread
From: Tom Herbert @ 2015-02-28  3:11 UTC (permalink / raw)
  To: davem, netdev

This patch add MPLS cases in flow dissection. If the top labels are
entropy labels (entropy label next indicator label followed by an
entropy label) the entropy value is extracted and used as a
substitute for ports in flow keys.

Entropy labels for MPLS are described in RFC 6790.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/uapi/linux/in.h   |  2 ++
 include/uapi/linux/mpls.h |  3 +++
 net/core/flow_dissector.c | 24 ++++++++++++++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 589ced0..5f9bc1f 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -69,6 +69,8 @@ enum {
 #define IPPROTO_SCTP		IPPROTO_SCTP
   IPPROTO_UDPLITE = 136,	/* UDP-Lite (RFC 3828)			*/
 #define IPPROTO_UDPLITE		IPPROTO_UDPLITE
+  IPPROTO_MPLS = 137,		/* MPLS (RFC 4023)			*/
+#define IPPROTO_MPLS		IPPROTO_MPLS
   IPPROTO_RAW = 255,		/* Raw IP packets			*/
 #define IPPROTO_RAW		IPPROTO_RAW
   IPPROTO_MAX
diff --git a/include/uapi/linux/mpls.h b/include/uapi/linux/mpls.h
index bc9abfe..3be9eb7 100644
--- a/include/uapi/linux/mpls.h
+++ b/include/uapi/linux/mpls.h
@@ -31,4 +31,7 @@ struct mpls_label {
 #define MPLS_LS_TTL_MASK        0x000000FF
 #define MPLS_LS_TTL_SHIFT       0
 
+/* Reserved labels */
+#define MPLS_LS_ENTROPY_IS_NEXT	7
+
 #endif /* _UAPI_MPLS_H */
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index e04946f..c4c35cd 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -7,6 +7,7 @@
 #include <net/ipv6.h>
 #include <linux/igmp.h>
 #include <linux/icmp.h>
+#include <linux/mpls.h>
 #include <linux/sctp.h>
 #include <linux/dccp.h>
 #include <linux/if_tunnel.h>
@@ -193,6 +194,26 @@ ipv6:
 		flow->thoff = (u16)nhoff;
 		return true;
 	}
+	case htons(ETH_P_MPLS_UC):
+	case htons(ETH_P_MPLS_MC): {
+		struct mpls_label *hdr, _hdr[2];
+mpls:
+		hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data,
+					   hlen, &_hdr);
+		if (!hdr)
+			return false;
+		if ((ntohl(hdr[0].entry) & MPLS_LS_LABEL_MASK) ==
+		    MPLS_LS_ENTROPY_IS_NEXT) {
+			flow->n_proto = proto;
+			flow->ip_proto = ip_proto;
+			flow->ports = ntohl(hdr[1].entry) & MPLS_LS_LABEL_MASK;
+			flow->thoff = (u16)nhoff;
+
+			return true;
+		} else {
+			return false;
+		}
+	}
 	case htons(ETH_P_FCOE):
 		flow->thoff = (u16)(nhoff + FCOE_HEADER_LEN);
 		/* fall through */
@@ -258,6 +279,9 @@ ipv6:
 	case IPPROTO_IPV6:
 		proto = htons(ETH_P_IPV6);
 		goto ipv6;
+	case IPPROTO_MPLS:
+		proto = htons(ETH_P_MPLS_UC);
+		goto mpls;
 	default:
 		break;
 	}
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-02-28  3:11 ` [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct Tom Herbert
@ 2015-02-28  7:37   ` Eric Dumazet
  2015-02-28 20:31     ` Florian Westphal
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2015-02-28  7:37 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Fri, 2015-02-27 at 19:11 -0800, Tom Herbert wrote:

> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index c605d30..d41a034 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -252,7 +252,7 @@ struct qdisc_skb_cb {
>  	unsigned int		pkt_len;
>  	u16			slave_dev_queue_mapping;
>  	u16			_pad;
> -#define QDISC_CB_PRIV_LEN 20
> +#define QDISC_CB_PRIV_LEN 24
>  	unsigned char		data[QDISC_CB_PRIV_LEN];
>  };
>  

This change breaks kernel build : We already are at the cb[] limit.

Please check commit 257117862634d89de33fec74858b1a0ba5ab444b
("net: sched: shrink struct qdisc_skb_cb to 28 bytes")


                 from drivers/infiniband/ulp/ipoib/ipoib_main.c:35:
In function ‘ipoib_skb_cb’,
    inlined from ‘ipoib_hard_header’ at drivers/infiniband/ulp/ipoib/ipoib_main.c:816:19:
include/linux/compiler.h:424:20: error: call to ‘__compiletime_assert_136’ declared with attribute error: BUILD_BUG_ON failed: sizeof(skb->cb) < sizeof(struct ipoib_cb)
    prefix ## suffix();    \
                    ^
include/linux/compiler.h:429:2: note: in expansion of macro ‘__compiletime_assert’
  __compiletime_assert(condition, msg, prefix, suffix)
  ^
include/linux/compiler.h:441:2: note: in expansion of macro ‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^
include/linux/bug.h:50:37: note: in expansion of macro ‘compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^
include/linux/bug.h:74:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  ^
drivers/infiniband/ulp/ipoib/ipoib.h:136:2: note: in expansion of macro ‘BUILD_BUG_ON’
  BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
  ^
  CC      drivers/parport/ieee1284.o
  CC      drivers/net/arcnet/arc-rawmode.o
  CC      drivers/misc/c2port/core.o
  CC      drivers/mfd/rtsx_pcr.o
In function ‘ipoib_skb_cb’,
    inlined from ‘ipoib_start_xmit’ at drivers/infiniband/ulp/ipoib/ipoib_main.c:719:19:
include/linux/compiler.h:424:20: error: call to ‘__compiletime_assert_136’ declared with attribute error: BUILD_BUG_ON failed: sizeof(skb->cb) < sizeof(struct ipoib_cb)
    prefix ## suffix();    \
                    ^
include/linux/compiler.h:429:2: note: in expansion of macro ‘__compiletime_assert’
  __compiletime_assert(condition, msg, prefix, suffix)
  ^
include/linux/compiler.h:441:2: note: in expansion of macro ‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^
include/linux/bug.h:50:37: note: in expansion of macro ‘compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^
include/linux/bug.h:74:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  ^
drivers/infiniband/ulp/ipoib/ipoib.h:136:2: note: in expansion of macro ‘BUILD_BUG_ON’
  BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
  ^
make[4]: *** [drivers/infiniband/ulp/ipoib/ipoib_main.o] Error 1
make[3]: *** [drivers/infiniband/ulp/ipoib] Error 2
make[2]: *** [drivers/infiniband/ulp] Error 2
make[1]: *** [drivers/infiniband] Error 2
make[1]: *** Waiting for unfinished jobs....

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-02-28  7:37   ` Eric Dumazet
@ 2015-02-28 20:31     ` Florian Westphal
  2015-02-28 20:46       ` Dave Taht
  2015-03-01 17:55       ` Tom Herbert
  0 siblings, 2 replies; 18+ messages in thread
From: Florian Westphal @ 2015-02-28 20:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2015-02-27 at 19:11 -0800, Tom Herbert wrote:
> 
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index c605d30..d41a034 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -252,7 +252,7 @@ struct qdisc_skb_cb {
> >  	unsigned int		pkt_len;
> >  	u16			slave_dev_queue_mapping;
> >  	u16			_pad;
> > -#define QDISC_CB_PRIV_LEN 20
> > +#define QDISC_CB_PRIV_LEN 24
> >  	unsigned char		data[QDISC_CB_PRIV_LEN];
> >  };
> >  
> 
> This change breaks kernel build : We already are at the cb[] limit.
> 
> Please check commit 257117862634d89de33fec74858b1a0ba5ab444b
> ("net: sched: shrink struct qdisc_skb_cb to 28 bytes")

I've been toying around with reducing skb->cb[] to 44 bytes,
Seems Tom could integrate following patch from my test branch:

http://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=skb_cb_44_01&id=29d711e1a71244b71940c2d1e346500bef4d6670

It makes sfq use a smaller flow key state.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-02-28 20:31     ` Florian Westphal
@ 2015-02-28 20:46       ` Dave Taht
  2015-03-01 18:16         ` Tom Herbert
  2015-03-01 17:55       ` Tom Herbert
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Taht @ 2015-02-28 20:46 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, Tom Herbert, davem, netdev

On Sat, Feb 28, 2015 at 12:31 PM, Florian Westphal <fw@strlen.de> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Fri, 2015-02-27 at 19:11 -0800, Tom Herbert wrote:
>>
>> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> > index c605d30..d41a034 100644
>> > --- a/include/net/sch_generic.h
>> > +++ b/include/net/sch_generic.h
>> > @@ -252,7 +252,7 @@ struct qdisc_skb_cb {
>> >     unsigned int            pkt_len;
>> >     u16                     slave_dev_queue_mapping;
>> >     u16                     _pad;
>> > -#define QDISC_CB_PRIV_LEN 20
>> > +#define QDISC_CB_PRIV_LEN 24
>> >     unsigned char           data[QDISC_CB_PRIV_LEN];
>> >  };
>> >
>>
>> This change breaks kernel build : We already are at the cb[] limit.
>>
>> Please check commit 257117862634d89de33fec74858b1a0ba5ab444b
>> ("net: sched: shrink struct qdisc_skb_cb to 28 bytes")
>
> I've been toying around with reducing skb->cb[] to 44 bytes,
> Seems Tom could integrate following patch from my test branch:
>
> http://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=skb_cb_44_01&id=29d711e1a71244b71940c2d1e346500bef4d6670
>
> It makes sfq use a smaller flow key state.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

My concern with all this work is that you are possibly not looking at
the quality of the hash
as the number of queues goes down, or the effects of adding in all
this extra stuff
to the hash is, in cases where they don't exist, or are not very random.

The default, in fq_codel is 1024 queues, and that worked pretty good
in monty carlo simulations, but I have always felt it could be better
after we measured more real traffic - there is not a lot of
information in the proto field in real traffic, and - although it has
been improved - the ipv6 hash was kind of weak originally and a little
odd now.

As some are attempting to deploy these hashes with 64, 32 and even 8
queues, I would hope that someone (and I can if I get the time) would
look closely at avalanche effects down to these last few bits.

http://en.wikipedia.org/wiki/Avalanche_effect

-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-02-28 20:31     ` Florian Westphal
  2015-02-28 20:46       ` Dave Taht
@ 2015-03-01 17:55       ` Tom Herbert
  2015-03-01 18:24         ` Florian Westphal
  2015-03-01 21:27         ` Eric Dumazet
  1 sibling, 2 replies; 18+ messages in thread
From: Tom Herbert @ 2015-03-01 17:55 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, David Miller, Linux Netdev List

On Sat, Feb 28, 2015 at 12:31 PM, Florian Westphal <fw@strlen.de> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Fri, 2015-02-27 at 19:11 -0800, Tom Herbert wrote:
>>
>> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> > index c605d30..d41a034 100644
>> > --- a/include/net/sch_generic.h
>> > +++ b/include/net/sch_generic.h
>> > @@ -252,7 +252,7 @@ struct qdisc_skb_cb {
>> >     unsigned int            pkt_len;
>> >     u16                     slave_dev_queue_mapping;
>> >     u16                     _pad;
>> > -#define QDISC_CB_PRIV_LEN 20
>> > +#define QDISC_CB_PRIV_LEN 24
>> >     unsigned char           data[QDISC_CB_PRIV_LEN];
>> >  };
>> >
>>
>> This change breaks kernel build : We already are at the cb[] limit.
>>
>> Please check commit 257117862634d89de33fec74858b1a0ba5ab444b
>> ("net: sched: shrink struct qdisc_skb_cb to 28 bytes")
>
> I've been toying around with reducing skb->cb[] to 44 bytes,
> Seems Tom could integrate following patch from my test branch:
>
> http://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=skb_cb_44_01&id=29d711e1a71244b71940c2d1e346500bef4d6670
>
> It makes sfq use a smaller flow key state.

Alternatively, I think we might be able to eliminate the use of
flow_keys and flow_dissect from the qdisc code altogether. It looks
like this is only being used to determine a hash over the addresses,
ports, and protocol so I am thinking that we can just call
skb_get_hash for that. Will try to post some patches soon.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-02-28 20:46       ` Dave Taht
@ 2015-03-01 18:16         ` Tom Herbert
  2015-03-01 20:09           ` Dave Taht
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2015-03-01 18:16 UTC (permalink / raw)
  To: Dave Taht; +Cc: Florian Westphal, Eric Dumazet, davem, netdev

On Sat, Feb 28, 2015 at 12:46 PM, Dave Taht <dave.taht@gmail.com> wrote:
> On Sat, Feb 28, 2015 at 12:31 PM, Florian Westphal <fw@strlen.de> wrote:
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Fri, 2015-02-27 at 19:11 -0800, Tom Herbert wrote:
>>>
>>> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>> > index c605d30..d41a034 100644
>>> > --- a/include/net/sch_generic.h
>>> > +++ b/include/net/sch_generic.h
>>> > @@ -252,7 +252,7 @@ struct qdisc_skb_cb {
>>> >     unsigned int            pkt_len;
>>> >     u16                     slave_dev_queue_mapping;
>>> >     u16                     _pad;
>>> > -#define QDISC_CB_PRIV_LEN 20
>>> > +#define QDISC_CB_PRIV_LEN 24
>>> >     unsigned char           data[QDISC_CB_PRIV_LEN];
>>> >  };
>>> >
>>>
>>> This change breaks kernel build : We already are at the cb[] limit.
>>>
>>> Please check commit 257117862634d89de33fec74858b1a0ba5ab444b
>>> ("net: sched: shrink struct qdisc_skb_cb to 28 bytes")
>>
>> I've been toying around with reducing skb->cb[] to 44 bytes,
>> Seems Tom could integrate following patch from my test branch:
>>
>> http://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=skb_cb_44_01&id=29d711e1a71244b71940c2d1e346500bef4d6670
>>
>> It makes sfq use a smaller flow key state.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> My concern with all this work is that you are possibly not looking at
> the quality of the hash
> as the number of queues goes down, or the effects of adding in all
> this extra stuff
> to the hash is, in cases where they don't exist, or are not very random.
>
> The default, in fq_codel is 1024 queues, and that worked pretty good
> in monty carlo simulations, but I have always felt it could be better
> after we measured more real traffic - there is not a lot of
> information in the proto field in real traffic, and - although it has
> been improved - the ipv6 hash was kind of weak originally and a little
> odd now.
>
> As some are attempting to deploy these hashes with 64, 32 and even 8
> queues, I would hope that someone (and I can if I get the time) would
> look closely at avalanche effects down to these last few bits.
>
> http://en.wikipedia.org/wiki/Avalanche_effect
>
We are only increasing the input to the hash function by XOR not
reducing, so it seems unlikely this could result in less entropy. In
worse case extra input would might have no effect. As for the
avalanche effect that is more dependent on the hash function itself.
In the kernel we are using Jenkin's hash for such things, and there's
a nice graphical representation for the avalanche effect in the
wikipedia page:

http://en.wikipedia.org/wiki/Jenkins_hash_function


> --
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
>
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-03-01 17:55       ` Tom Herbert
@ 2015-03-01 18:24         ` Florian Westphal
  2015-03-01 19:17           ` Tom Herbert
  2015-03-01 19:57           ` Dave Taht
  2015-03-01 21:27         ` Eric Dumazet
  1 sibling, 2 replies; 18+ messages in thread
From: Florian Westphal @ 2015-03-01 18:24 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Florian Westphal, Eric Dumazet, David Miller, Linux Netdev List

Tom Herbert <therbert@google.com> wrote:
> On Sat, Feb 28, 2015 at 12:31 PM, Florian Westphal <fw@strlen.de> wrote:
> > I've been toying around with reducing skb->cb[] to 44 bytes,
> > Seems Tom could integrate following patch from my test branch:
> >
> > http://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=skb_cb_44_01&id=29d711e1a71244b71940c2d1e346500bef4d6670
> >
> > It makes sfq use a smaller flow key state.
> 
> Alternatively, I think we might be able to eliminate the use of
> flow_keys and flow_dissect from the qdisc code altogether. It looks
> like this is only being used to determine a hash over the addresses,
> ports, and protocol so I am thinking that we can just call
> skb_get_hash for that. Will try to post some patches soon.

The problem with this is that you'll lose the secret input to jhash
in sfq_hash().

assuming you have packets p1 and p2 (from different flows)
with skb_get_hash(p1) == skb_get_hash(p2) those flows share same
queue/bin forever as the hash pertubation will no longer work.

For sfq, hash collisions may exist as well but they'll be resolved
after some time when q->perturbation (its part of hash input) is reseeded.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-03-01 18:24         ` Florian Westphal
@ 2015-03-01 19:17           ` Tom Herbert
  2015-03-01 19:43             ` Florian Westphal
  2015-03-01 19:57           ` Dave Taht
  1 sibling, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2015-03-01 19:17 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, David Miller, Linux Netdev List

On Sun, Mar 1, 2015 at 10:24 AM, Florian Westphal <fw@strlen.de> wrote:
> Tom Herbert <therbert@google.com> wrote:
>> On Sat, Feb 28, 2015 at 12:31 PM, Florian Westphal <fw@strlen.de> wrote:
>> > I've been toying around with reducing skb->cb[] to 44 bytes,
>> > Seems Tom could integrate following patch from my test branch:
>> >
>> > http://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=skb_cb_44_01&id=29d711e1a71244b71940c2d1e346500bef4d6670
>> >
>> > It makes sfq use a smaller flow key state.
>>
>> Alternatively, I think we might be able to eliminate the use of
>> flow_keys and flow_dissect from the qdisc code altogether. It looks
>> like this is only being used to determine a hash over the addresses,
>> ports, and protocol so I am thinking that we can just call
>> skb_get_hash for that. Will try to post some patches soon.
>
> The problem with this is that you'll lose the secret input to jhash
> in sfq_hash().
>
> assuming you have packets p1 and p2 (from different flows)
> with skb_get_hash(p1) == skb_get_hash(p2) those flows share same
> queue/bin forever as the hash pertubation will no longer work.
>
We still need hash perturbation for the mapping to a small number of
queues which can be done after retrieving skb_get_hash, but the
probability that two different flows match perfectly in skb_get_hash()
should be 1/2^32-- so are hash collisions really a concern here? Note
that we already lose information in the IPv6 address fold in
flow_dissect and don't include VLAN or VNID yet in the flow_keys yet,
so these probably already make a greater probability in a tuple
collision for sfc hash.

> For sfq, hash collisions may exist as well but they'll be resolved
> after some time when q->perturbation (its part of hash input) is reseeded.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-03-01 19:17           ` Tom Herbert
@ 2015-03-01 19:43             ` Florian Westphal
  0 siblings, 0 replies; 18+ messages in thread
From: Florian Westphal @ 2015-03-01 19:43 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Florian Westphal, Eric Dumazet, David Miller, Linux Netdev List

Tom Herbert <therbert@google.com> wrote:
> We still need hash perturbation for the mapping to a small number of
> queues which can be done after retrieving skb_get_hash, but the
> probability that two different flows match perfectly in skb_get_hash()
> should be 1/2^32-- so are hash collisions really a concern here?

I'm not concerned about accidental collisions, how predictable is skb_get_hash()?
Is skb_get_hash() guaranteed to e.g. contain L4 information?

AFAIK answer to both is "depends on nic/driver", so I feel its better to use
software flow dissector.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-03-01 18:24         ` Florian Westphal
  2015-03-01 19:17           ` Tom Herbert
@ 2015-03-01 19:57           ` Dave Taht
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Taht @ 2015-03-01 19:57 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Tom Herbert, Eric Dumazet, David Miller, Linux Netdev List

On Sun, Mar 1, 2015 at 10:24 AM, Florian Westphal <fw@strlen.de> wrote:
> Tom Herbert <therbert@google.com> wrote:
>> On Sat, Feb 28, 2015 at 12:31 PM, Florian Westphal <fw@strlen.de> wrote:
>> > I've been toying around with reducing skb->cb[] to 44 bytes,
>> > Seems Tom could integrate following patch from my test branch:
>> >
>> > http://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=skb_cb_44_01&id=29d711e1a71244b71940c2d1e346500bef4d6670
>> >
>> > It makes sfq use a smaller flow key state.
>>
>> Alternatively, I think we might be able to eliminate the use of
>> flow_keys and flow_dissect from the qdisc code altogether. It looks
>> like this is only being used to determine a hash over the addresses,
>> ports, and protocol so I am thinking that we can just call
>> skb_get_hash for that. Will try to post some patches soon.
>
> The problem with this is that you'll lose the secret input to jhash
> in sfq_hash().

fq_codel relies on a secret input to the hash also, but it is only
created at instantation time. Retaining that secret input is critical
- otherwise an attacker can fairly easily derive the hash and find
ways of flooding the connection at low rates that are almost as
effective as simple attacks against pfifo_fast.

A note on sfq: it is commonly deployed with perturb 10 - which acts as
a "poor man's AQM" scrambling big flows and causing tcp cwnd
reductions - or at least it used to before linux gained much better
resilience in re-ordering. Either behavior - blowing up tcp flows
(then) - or allowing them to overbuffer (now) is pretty undesirable.

I have gone in detail on everything that is wrong with common SFQ
deployments here, commenting extensively on what was once, one of the
most popular shapers in the world:

http://www.bufferbloat.net/projects/cerowrt/wiki/Wondershaper_Must_Die

> assuming you have packets p1 and p2 (from different flows)
> with skb_get_hash(p1) == skb_get_hash(p2) those flows share same
> queue/bin forever as the hash pertubation will no longer work.
>
> For sfq, hash collisions may exist as well but they'll be resolved
> after some time when q->perturbation (its part of hash input) is reseeded.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-03-01 18:16         ` Tom Herbert
@ 2015-03-01 20:09           ` Dave Taht
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Taht @ 2015-03-01 20:09 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Florian Westphal, Eric Dumazet, davem, netdev

On Sun, Mar 1, 2015 at 10:16 AM, Tom Herbert <therbert@google.com> wrote:
> On Sat, Feb 28, 2015 at 12:46 PM, Dave Taht <dave.taht@gmail.com> wrote:
>> On Sat, Feb 28, 2015 at 12:31 PM, Florian Westphal <fw@strlen.de> wrote:
>>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> On Fri, 2015-02-27 at 19:11 -0800, Tom Herbert wrote:
>>>>
>>>> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>>> > index c605d30..d41a034 100644
>>>> > --- a/include/net/sch_generic.h
>>>> > +++ b/include/net/sch_generic.h
>>>> > @@ -252,7 +252,7 @@ struct qdisc_skb_cb {
>>>> >     unsigned int            pkt_len;
>>>> >     u16                     slave_dev_queue_mapping;
>>>> >     u16                     _pad;
>>>> > -#define QDISC_CB_PRIV_LEN 20
>>>> > +#define QDISC_CB_PRIV_LEN 24
>>>> >     unsigned char           data[QDISC_CB_PRIV_LEN];
>>>> >  };
>>>> >
>>>>
>>>> This change breaks kernel build : We already are at the cb[] limit.
>>>>
>>>> Please check commit 257117862634d89de33fec74858b1a0ba5ab444b
>>>> ("net: sched: shrink struct qdisc_skb_cb to 28 bytes")
>>>
>>> I've been toying around with reducing skb->cb[] to 44 bytes,
>>> Seems Tom could integrate following patch from my test branch:
>>>
>>> http://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=skb_cb_44_01&id=29d711e1a71244b71940c2d1e346500bef4d6670
>>>
>>> It makes sfq use a smaller flow key state.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> My concern with all this work is that you are possibly not looking at
>> the quality of the hash
>> as the number of queues goes down, or the effects of adding in all
>> this extra stuff
>> to the hash is, in cases where they don't exist, or are not very random.
>>
>> The default, in fq_codel is 1024 queues, and that worked pretty good
>> in monty carlo simulations, but I have always felt it could be better
>> after we measured more real traffic - there is not a lot of
>> information in the proto field in real traffic, and - although it has
>> been improved - the ipv6 hash was kind of weak originally and a little
>> odd now.
>>
>> As some are attempting to deploy these hashes with 64, 32 and even 8
>> queues, I would hope that someone (and I can if I get the time) would
>> look closely at avalanche effects down to these last few bits.
>>
>> http://en.wikipedia.org/wiki/Avalanche_effect
>>
> We are only increasing the input to the hash function by XOR not
> reducing, so it seems unlikely this could result in less entropy. In
> worse case extra input would might have no effect. As for the
> avalanche effect that is more dependent on the hash function itself.
> In the kernel we are using Jenkin's hash for such things, and there's
> a nice graphical representation for the avalanche effect in the
> wikipedia page:
>
> http://en.wikipedia.org/wiki/Jenkins_hash_function

I did not say you were wrong! I just said you were making me nervous. :)

Hash functions are usually evaluated by tossing random data into them
and expecting random data all the way to the least significant bit.

In the networking case there is now a significant amount of data with
low entropy tossed into the function. I would be happier from a
theoretical perspective if you just tossed every input (like the full
ipv6 addresses) into the hash itself with no xor tricks and with some
care as to what low entropy sources were being used on the low order
bits.

As an example, you get 2 bits of data from the remote port truly mixed
in at 1024 queues, and yes, jenkins should avalanche that, but I
really would prefer it be evaluated on various forms of real traffic,
not random data.

And there might be other hash functions besides jenkins better or
faster now. Although I have an interest in such things, I generally
lack time to play with the way coool! new stuff like

http://www.burtleburtle.net/bob/hash/spooky.html

and:

https://code.google.com/p/smhasher/wiki/MurmurHash

Certainly it doesn't generally matter what hash is used, so long as it
is correctly responsive to its inputs, and fast.

>
>> --
>> Dave Täht
>> Let's make wifi fast, less jittery and reliable again!
>>
>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct
  2015-03-01 17:55       ` Tom Herbert
  2015-03-01 18:24         ` Florian Westphal
@ 2015-03-01 21:27         ` Eric Dumazet
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2015-03-01 21:27 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Florian Westphal, David Miller, Linux Netdev List

On Sun, 2015-03-01 at 09:55 -0800, Tom Herbert wrote:

> Alternatively, I think we might be able to eliminate the use of
> flow_keys and flow_dissect from the qdisc code altogether. It looks
> like this is only being used to determine a hash over the addresses,
> ports, and protocol so I am thinking that we can just call
> skb_get_hash for that. Will try to post some patches soon.

Note that SFQ only stored the keys in case of a potential rehash,
and because skb->cb[] was available then.

As the rehash implies to _not_ use skb_get_hash() it is simply better to
recompute the flow key from packet, in the unlikely case we do have a
rehash.

Rehashing SFQ adds reorders, so people should think twice before using
it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-03-01 21:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-28  3:11 [PATCH net-next 0/6] net: Strengthen TX and RX hashes Tom Herbert
2015-02-28  3:11 ` [PATCH net-next 1/6] net: initialize sk_txhash using random value Tom Herbert
2015-02-28  3:11 ` [PATCH net-next 2/6] flow_dissector: Include ip_proto in hash computation Tom Herbert
2015-02-28  3:11 ` [PATCH net-next 3/6] flow_dissector: Add hash_extra field to flow_keys struct Tom Herbert
2015-02-28  7:37   ` Eric Dumazet
2015-02-28 20:31     ` Florian Westphal
2015-02-28 20:46       ` Dave Taht
2015-03-01 18:16         ` Tom Herbert
2015-03-01 20:09           ` Dave Taht
2015-03-01 17:55       ` Tom Herbert
2015-03-01 18:24         ` Florian Westphal
2015-03-01 19:17           ` Tom Herbert
2015-03-01 19:43             ` Florian Westphal
2015-03-01 19:57           ` Dave Taht
2015-03-01 21:27         ` Eric Dumazet
2015-02-28  3:11 ` [PATCH net-next 4/6] flow_dissector: Include VLAN ID in hash computation Tom Herbert
2015-02-28  3:11 ` [PATCH net-next 5/6] flow_dissector: Include GRE keyid " Tom Herbert
2015-02-28  3:11 ` [PATCH net-next 6/6] flow_dissector: Include MPLS entropy label " Tom Herbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.