linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: improve ipv4 performances
@ 2018-04-01 18:31 Anton Gary Ceph
  2018-04-01 18:50 ` Stephen Hemminger
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Anton Gary Ceph @ 2018-04-01 18:31 UTC (permalink / raw)
  To: netdev, linux-kernel

As the Linux networking stack is growing, more and more protocols are
added, increasing the complexity of stack itself.
Modern processors, contrary to common belief, are very bad in branch
prediction, so it's our task to give hints to the compiler when possible.

After a few profiling and analysis, turned out that the ethertype field
of the packets has the following distribution:

    92.1% ETH_P_IP
     3.2% ETH_P_ARP
     2.7% ETH_P_8021Q
     1.4% ETH_P_PPP_SES
     0.6% don't know/no opinion

>From a projection on statistics collected by Google about IPv6 adoption[1],
IPv6 should peak at 25% usage at the beginning of 2030. Hence, we should
give proper hints to the compiler about the low IPv6 usage.

Here is an iperf3 run before and after the patch:

Before:
[ ID]  Interval           Transfer    Bandwidth       Retr
[  4]  0.00-100.00 sec    100 GBytes  8.60 Gbits/sec  0       sender
[  4]  0.00-100.00 sec    100 GBytes  8.60 Gbits/sec          receiver

After
[ ID]  Interval           Transfer    Bandwidth       Retr
[  4]  0.00-100.00 sec    109 GBytes  9.35 Gbits/sec  0       sender
[  4]  0.00-100.00 sec    109 GBytes  9.35 Gbits/sec          receiver

[1] https://www.google.com/intl/en/ipv6/statistics.html

Signed-off-by: Anton Gary Ceph <agaceph@gmail.com>
---
 drivers/net/bonding/bond_main.c    |  2 +-
 drivers/net/ipvlan/ipvlan_core.c   |  2 +-
 drivers/net/vxlan.c                |  6 +++---
 include/linux/netdevice.h          |  2 +-
 include/net/ip_tunnels.h           |  2 +-
 include/net/netfilter/nf_queue.h   |  4 ++--
 net/bridge/br_device.c             |  2 +-
 net/bridge/br_input.c              |  2 +-
 net/bridge/br_mdb.c                |  5 +++--
 net/bridge/br_multicast.c          | 18 +++++++++---------
 net/bridge/br_netfilter_hooks.c    |  9 +++++----
 net/bridge/br_private.h            |  2 +-
 net/core/dev.c                     |  2 +-
 net/core/filter.c                  |  8 ++++----
 net/core/skbuff.c                  |  2 +-
 net/core/tso.c                     |  2 +-
 net/ipv4/ip_gre.c                  |  6 +++---
 net/ipv4/ip_tunnel.c               | 12 ++++++------
 net/ipv4/ping.c                    | 10 +++++-----
 net/ipv6/datagram.c                |  6 +++---
 net/netfilter/nf_flow_table_inet.c |  2 +-
 net/netfilter/nf_tables_netdev.c   |  2 +-
 net/netfilter/nfnetlink_queue.c    |  2 +-
 net/openvswitch/actions.c          |  2 +-
 net/openvswitch/conntrack.c        | 16 ++++++++--------
 net/openvswitch/flow.c             |  4 ++--
 net/openvswitch/flow.h             |  2 +-
 net/openvswitch/flow_netlink.c     | 18 +++++++++---------
 net/xfrm/xfrm_output.c             |  2 +-
 29 files changed, 78 insertions(+), 76 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b7b113018853..b3ad2a8c1a08 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3222,7 +3222,7 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 		noff += iph->ihl << 2;
 		if (!ip_is_fragment(iph))
 			proto = iph->protocol;
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+	} else if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 		if (unlikely(!pskb_may_pull(skb, noff + sizeof(*iph6))))
 			return false;
 		iph6 = ipv6_hdr(skb);
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index c1f008fe4e1d..7344e2402003 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -480,7 +480,7 @@ static int ipvlan_process_outbound(struct sk_buff *skb)
 		skb_reset_network_header(skb);
 	}
 
-	if (skb->protocol == htons(ETH_P_IPV6))
+	if (unlikely(skb->protocol == htons(ETH_P_IPV6)))
 		ret = ipvlan_process_v6_outbound(skb);
 	else if (skb->protocol == htons(ETH_P_IP))
 		ret = ipvlan_process_v4_outbound(skb);
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fab7a4db249e..8143b99e098f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1694,7 +1694,7 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 		return false;
 
 	n = NULL;
-	switch (ntohs(eth_hdr(skb)->h_proto)) {
+	switch (__builtin_expect(ntohs(eth_hdr(skb)->h_proto), ETH_P_IP)) {
 	case ETH_P_IP:
 	{
 		struct iphdr *pip;
@@ -2274,7 +2274,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (ntohs(eth->h_proto) == ETH_P_ARP)
 			return arp_reduce(dev, skb, vni);
 #if IS_ENABLED(CONFIG_IPV6)
-		else if (ntohs(eth->h_proto) == ETH_P_IPV6 &&
+		else if (unlikely(ntohs(eth->h_proto) == ETH_P_IPV6) &&
 			 pskb_may_pull(skb, sizeof(struct ipv6hdr) +
 					    sizeof(struct nd_msg)) &&
 			 ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) {
@@ -2293,7 +2293,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (f && (f->flags & NTF_ROUTER) && (vxlan->cfg.flags & VXLAN_F_RSC) &&
 	    (ntohs(eth->h_proto) == ETH_P_IP ||
-	     ntohs(eth->h_proto) == ETH_P_IPV6)) {
+	     unlikely(ntohs(eth->h_proto) == ETH_P_IPV6))) {
 		did_rsc = route_shortcircuit(dev, skb);
 		if (did_rsc)
 			f = vxlan_find_mac(vxlan, eth->h_dest, vni);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5eef6c8e2741..c1a4820622f9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4031,7 +4031,7 @@ static inline bool can_checksum_protocol(netdev_features_t features,
 		return true;
 	}
 
-	switch (protocol) {
+	switch (__builtin_expect(protocol, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		return !!(features & NETIF_F_IP_CSUM);
 	case htons(ETH_P_IPV6):
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 1f16773cfd76..f837867ff3b7 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -355,7 +355,7 @@ static inline u8 ip_tunnel_get_dsfield(const struct iphdr *iph,
 {
 	if (skb->protocol == htons(ETH_P_IP))
 		return iph->tos;
-	else if (skb->protocol == htons(ETH_P_IPV6))
+	else if (unlikely(skb->protocol == htons(ETH_P_IPV6)))
 		return ipv6_get_dsfield((const struct ipv6hdr *)iph);
 	else
 		return 0;
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index a50a69f5334c..c97b6a7719f4 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -79,7 +79,7 @@ static inline u32 hash_bridge(const struct sk_buff *skb, u32 initval)
 	struct ipv6hdr *ip6h, _ip6h;
 	struct iphdr *iph, _iph;
 
-	switch (eth_hdr(skb)->h_proto) {
+	switch (__builtin_expect(eth_hdr(skb)->h_proto, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		iph = skb_header_pointer(skb, skb_network_offset(skb),
 					 sizeof(*iph), &_iph);
@@ -101,7 +101,7 @@ static inline u32
 nfqueue_hash(const struct sk_buff *skb, u16 queue, u16 queues_total, u8 family,
 	     u32 initval)
 {
-	switch (family) {
+	switch (__builtin_expect(family, NFPROTO_IPV4)) {
 	case NFPROTO_IPV4:
 		queue += reciprocal_scale(hash_v4(ip_hdr(skb), initval),
 					  queues_total);
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 1285ca30ab0a..881c4bc794b9 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -70,7 +70,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	    br->neigh_suppress_enabled) {
 		br_do_proxy_suppress_arp(skb, br, vid, NULL);
 	} else if (IS_ENABLED(CONFIG_IPV6) &&
-		   skb->protocol == htons(ETH_P_IPV6) &&
+		   unlikely(skb->protocol == htons(ETH_P_IPV6)) &&
 		   br->neigh_suppress_enabled &&
 		   pskb_may_pull(skb, sizeof(struct ipv6hdr) +
 				 sizeof(struct nd_msg)) &&
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 7f98a7d25866..6b8e4d808424 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -120,7 +120,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	     skb->protocol == htons(ETH_P_RARP))) {
 		br_do_proxy_suppress_arp(skb, br, vid, p);
 	} else if (IS_ENABLED(CONFIG_IPV6) &&
-		   skb->protocol == htons(ETH_P_IPV6) &&
+		   unlikely(skb->protocol == htons(ETH_P_IPV6)) &&
 		   br->neigh_suppress_enabled &&
 		   pskb_may_pull(skb, sizeof(struct ipv6hdr) +
 				 sizeof(struct nd_msg)) &&
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 6d9f48bd374a..4c019c8d6e22 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -128,7 +128,8 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
 				if (p->addr.proto == htons(ETH_P_IP))
 					e.addr.u.ip4 = p->addr.u.ip4;
 #if IS_ENABLED(CONFIG_IPV6)
-				if (p->addr.proto == htons(ETH_P_IPV6))
+				if (unlikely(p->addr.proto ==
+					     htons(ETH_P_IPV6)))
 					e.addr.u.ip6 = p->addr.u.ip6;
 #endif
 				e.addr.proto = p->addr.proto;
@@ -488,7 +489,7 @@ static bool is_valid_mdb_entry(struct br_mdb_entry *entry)
 		if (ipv4_is_local_multicast(entry->addr.u.ip4))
 			return false;
 #if IS_ENABLED(CONFIG_IPV6)
-	} else if (entry->addr.proto == htons(ETH_P_IPV6)) {
+	} else if (unlikely(entry->addr.proto == htons(ETH_P_IPV6))) {
 		if (ipv6_addr_is_ll_all_nodes(&entry->addr.u.ip6))
 			return false;
 #endif
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index cb4729539b82..1c978838b81a 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -62,7 +62,7 @@ static inline int br_ip_equal(const struct br_ip *a, const struct br_ip *b)
 		return 0;
 	if (a->vid != b->vid)
 		return 0;
-	switch (a->proto) {
+	switch (__builtin_expect(a->proto, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		return a->u.ip4 == b->u.ip4;
 #if IS_ENABLED(CONFIG_IPV6)
@@ -92,7 +92,7 @@ static inline int __br_ip6_hash(struct net_bridge_mdb_htable *mdb,
 static inline int br_ip_hash(struct net_bridge_mdb_htable *mdb,
 			     struct br_ip *ip)
 {
-	switch (ip->proto) {
+	switch (__builtin_expect(ip->proto, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		return __br_ip4_hash(mdb, ip->u.ip4, ip->vid);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -167,7 +167,7 @@ struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
 	ip.proto = skb->protocol;
 	ip.vid = vid;
 
-	switch (skb->protocol) {
+	switch (__builtin_expect(skb->protocol, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		ip.u.ip4 = ip_hdr(skb)->daddr;
 		break;
@@ -577,7 +577,7 @@ static struct sk_buff *br_multicast_alloc_query(struct net_bridge *br,
 						struct br_ip *addr,
 						u8 *igmp_type)
 {
-	switch (addr->proto) {
+	switch (__builtin_expect(addr->proto, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		return br_ip4_multicast_alloc_query(br, addr->u.ip4, igmp_type);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1321,7 +1321,7 @@ static bool br_multicast_select_querier(struct net_bridge *br,
 					struct net_bridge_port *port,
 					struct br_ip *saddr)
 {
-	switch (saddr->proto) {
+	switch (__builtin_expect(saddr->proto, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		return br_ip4_multicast_select_querier(br, port, saddr->u.ip4);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1761,7 +1761,7 @@ static void br_multicast_err_count(const struct net_bridge *br,
 	pstats = this_cpu_ptr(stats);
 
 	u64_stats_update_begin(&pstats->syncp);
-	switch (proto) {
+	switch (__builtin_expect(proto, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		pstats->mstats.igmp_parse_errors++;
 		break;
@@ -1909,7 +1909,7 @@ int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
 	if (br->multicast_disabled)
 		return 0;
 
-	switch (skb->protocol) {
+	switch (__builtin_expect(skb->protocol, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		ret = br_multicast_ipv4_rcv(br, port, skb, vid);
 		break;
@@ -2461,7 +2461,7 @@ bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto)
 
 	br = port->br;
 
-	switch (proto) {
+	switch (__builtin_expect(proto, ETH_P_IP)) {
 	case ETH_P_IP:
 		if (!timer_pending(&br->ip4_other_query.timer) ||
 		    rcu_dereference(br->ip4_querier.port) == port)
@@ -2493,7 +2493,7 @@ static void br_mcast_stats_add(struct bridge_mcast_stats __percpu *stats,
 	unsigned int t_len;
 
 	u64_stats_update_begin(&pstats->syncp);
-	switch (proto) {
+	switch (__builtin_expect(proto, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		t_len = ntohs(ip_hdr(skb)->tot_len) - ip_hdrlen(skb);
 		switch (type) {
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 9b16eaf33819..c622781eaa47 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -73,7 +73,8 @@ static int brnf_pass_vlan_indev __read_mostly;
 	(!skb_vlan_tag_present(skb) && skb->protocol == htons(ETH_P_IP))
 
 #define IS_IPV6(skb) \
-	(!skb_vlan_tag_present(skb) && skb->protocol == htons(ETH_P_IPV6))
+	(!skb_vlan_tag_present(skb) && \
+	 unlikely(skb->protocol == htons(ETH_P_IPV6)))
 
 #define IS_ARP(skb) \
 	(!skb_vlan_tag_present(skb) && skb->protocol == htons(ETH_P_ARP))
@@ -93,7 +94,7 @@ static inline __be16 vlan_proto(const struct sk_buff *skb)
 	 brnf_filter_vlan_tagged)
 
 #define IS_VLAN_IPV6(skb) \
-	(vlan_proto(skb) == htons(ETH_P_IPV6) && \
+	 unlikely(vlan_proto(skb) == htons(ETH_P_IPV6) && \
 	 brnf_filter_vlan_tagged)
 
 #define IS_VLAN_ARP(skb) \
@@ -534,7 +535,7 @@ static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff
 		if (skb->protocol == htons(ETH_P_IP))
 			nf_bridge->frag_max_size = IPCB(skb)->frag_max_size;
 
-		if (skb->protocol == htons(ETH_P_IPV6))
+		if (unlikely(skb->protocol == htons(ETH_P_IPV6)))
 			nf_bridge->frag_max_size = IP6CB(skb)->frag_max_size;
 
 		in = nf_bridge->physindev;
@@ -749,7 +750,7 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
 		return br_nf_ip_fragment(net, sk, skb, br_nf_push_frag_xmit);
 	}
 	if (IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) &&
-	    skb->protocol == htons(ETH_P_IPV6)) {
+	    unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 		const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
 		struct brnf_frag_data *data;
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 8e13a64d8c99..a208cc627662 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -686,7 +686,7 @@ __br_multicast_querier_exists(struct net_bridge *br,
 static inline bool br_multicast_querier_exists(struct net_bridge *br,
 					       struct ethhdr *eth)
 {
-	switch (eth->h_proto) {
+	switch (__builtin_expect(eth->h_proto, ETH_P_IP)) {
 	case (htons(ETH_P_IP)):
 		return __br_multicast_querier_exists(br,
 			&br->ip4_other_query, false);
diff --git a/net/core/dev.c b/net/core/dev.c
index ef0cc6ea5f8d..f829f0a68a94 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4395,7 +4395,7 @@ EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
  */
 static bool skb_pfmemalloc_protocol(struct sk_buff *skb)
 {
-	switch (skb->protocol) {
+	switch (__builtin_expect(skb->protocol, ETH_P_IP)) {
 	case htons(ETH_P_ARP):
 	case htons(ETH_P_IP):
 	case htons(ETH_P_IPV6):
diff --git a/net/core/filter.c b/net/core/filter.c
index 48aa7c7320db..6b7ab16505aa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2170,11 +2170,11 @@ static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)
 	__be16 from_proto = skb->protocol;
 
 	if (from_proto == htons(ETH_P_IP) &&
-	      to_proto == htons(ETH_P_IPV6))
+	      unlikely(to_proto == htons(ETH_P_IPV6)))
 		return bpf_skb_proto_4_to_6(skb);
 
-	if (from_proto == htons(ETH_P_IPV6) &&
-	      to_proto == htons(ETH_P_IP))
+	if (unlikely(from_proto == htons(ETH_P_IPV6)) &&
+	    to_proto == htons(ETH_P_IP))
 		return bpf_skb_proto_6_to_4(skb);
 
 	return -ENOTSUPP;
@@ -2240,7 +2240,7 @@ static const struct bpf_func_proto bpf_skb_change_type_proto = {
 
 static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
 {
-	switch (skb->protocol) {
+	switch (__builtin_expect(skb->protocol, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		return sizeof(struct iphdr);
 	case htons(ETH_P_IPV6):
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 857e4e6f751a..6236c7c18740 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4642,7 +4642,7 @@ int skb_checksum_setup(struct sk_buff *skb, bool recalculate)
 {
 	int err;
 
-	switch (skb->protocol) {
+	switch (__builtin_expect(skb->protocol, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		err = skb_checksum_setup_ipv4(skb, recalculate);
 		break;
diff --git a/net/core/tso.c b/net/core/tso.c
index 43f4eba61933..85da3c3b498b 100644
--- a/net/core/tso.c
+++ b/net/core/tso.c
@@ -21,7 +21,7 @@ void tso_build_hdr(struct sk_buff *skb, char *hdr, struct tso_t *tso,
 	int mac_hdr_len = skb_network_offset(skb);
 
 	memcpy(hdr, skb->data, hdr_len);
-	if (!tso->ipv6) {
+	if (likely(!tso->ipv6)) {
 		struct iphdr *iph = (void *)(hdr + mac_hdr_len);
 
 		iph->id = htons(tso->ip_id);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 0901de42ed85..6cf3e3e4cca3 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -189,9 +189,9 @@ static void ipgre_err(struct sk_buff *skb, u32 info,
 		return;
 
 #if IS_ENABLED(CONFIG_IPV6)
-       if (tpi->proto == htons(ETH_P_IPV6) &&
-           !ip6_err_gen_icmpv6_unreach(skb, iph->ihl * 4 + tpi->hdr_len,
-				       type, data_len))
+	if (unlikely(tpi->proto == htons(ETH_P_IPV6)) &&
+	    !ip6_err_gen_icmpv6_unreach(skb, iph->ihl * 4 + tpi->hdr_len,
+					type, data_len))
                return;
 #endif
 
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index a7fd1c5a2a14..74ac2caff5a5 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -541,7 +541,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 		}
 	}
 #if IS_ENABLED(CONFIG_IPV6)
-	else if (skb->protocol == htons(ETH_P_IPV6)) {
+	else if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 		struct rt6_info *rt6 = (struct rt6_info *)skb_dst(skb);
 
 		if (rt6 && mtu < dst_mtu(skb_dst(skb)) &&
@@ -587,7 +587,7 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, u8 proto)
 	if (tos == 1) {
 		if (skb->protocol == htons(ETH_P_IP))
 			tos = inner_iph->tos;
-		else if (skb->protocol == htons(ETH_P_IPV6))
+		else if (unlikely(skb->protocol == htons(ETH_P_IPV6)))
 			tos = ipv6_get_dsfield((const struct ipv6hdr *)inner_iph);
 	}
 	init_tunnel_flow(&fl4, proto, key->u.ipv4.dst, key->u.ipv4.src, 0,
@@ -609,7 +609,7 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, u8 proto)
 	if (ttl == 0) {
 		if (skb->protocol == htons(ETH_P_IP))
 			ttl = inner_iph->ttl;
-		else if (skb->protocol == htons(ETH_P_IPV6))
+		else if (unlikely(skb->protocol == htons(ETH_P_IPV6)))
 			ttl = ((const struct ipv6hdr *)inner_iph)->hop_limit;
 		else
 			ttl = ip4_dst_hoplimit(&rt->dst);
@@ -671,7 +671,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 			dst = rt_nexthop(rt, inner_iph->daddr);
 		}
 #if IS_ENABLED(CONFIG_IPV6)
-		else if (skb->protocol == htons(ETH_P_IPV6)) {
+		else if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 			const struct in6_addr *addr6;
 			struct neighbour *neigh;
 			bool do_tx_error_icmp;
@@ -713,7 +713,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (skb->protocol == htons(ETH_P_IP)) {
 			tos = inner_iph->tos;
 			connected = false;
-		} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		} else if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 			tos = ipv6_get_dsfield((const struct ipv6hdr *)inner_iph);
 			connected = false;
 		}
@@ -768,7 +768,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (skb->protocol == htons(ETH_P_IP))
 			ttl = inner_iph->ttl;
 #if IS_ENABLED(CONFIG_IPV6)
-		else if (skb->protocol == htons(ETH_P_IPV6))
+		else if (unlikely(skb->protocol == htons(ETH_P_IPV6)))
 			ttl = ((const struct ipv6hdr *)inner_iph)->hop_limit;
 #endif
 		else
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index b8f0db54b197..64b3eaa84974 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -183,7 +183,7 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident)
 		pr_debug("try to find: num = %d, daddr = %pI4, dif = %d\n",
 			 (int)ident, &ip_hdr(skb)->daddr, dif);
 #if IS_ENABLED(CONFIG_IPV6)
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+	} else if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 		pr_debug("try to find: num = %d, daddr = %pI6c, dif = %d\n",
 			 (int)ident, &ipv6_hdr(skb)->daddr, dif);
 #endif
@@ -208,7 +208,7 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident)
 			    isk->inet_rcv_saddr != ip_hdr(skb)->daddr)
 				continue;
 #if IS_ENABLED(CONFIG_IPV6)
-		} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		} else if (unlikely(skb->protocol == htons(ETH_P_IPV6)) &&
 			   sk->sk_family == AF_INET6) {
 
 			pr_debug("found: %p: num=%d, daddr=%pI6c, dif=%d\n", sk,
@@ -497,7 +497,7 @@ void ping_err(struct sk_buff *skb, int offset, u32 info)
 		type = icmp_hdr(skb)->type;
 		code = icmp_hdr(skb)->code;
 		icmph = (struct icmphdr *)(skb->data + offset);
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+	} else if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 		family = AF_INET6;
 		type = icmp6_hdr(skb)->icmp6_type;
 		code = icmp6_hdr(skb)->icmp6_code;
@@ -565,7 +565,7 @@ void ping_err(struct sk_buff *skb, int offset, u32 info)
 			break;
 		}
 #if IS_ENABLED(CONFIG_IPV6)
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+	} else if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 		harderr = pingv6_ops.icmpv6_err_convert(type, code, &err);
 #endif
 	}
@@ -929,7 +929,7 @@ int ping_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 
 		if (inet6_sk(sk)->rxopt.all)
 			pingv6_ops.ip6_datagram_recv_common_ctl(sk, msg, skb);
-		if (skb->protocol == htons(ETH_P_IPV6) &&
+		if (unlikely(skb->protocol == htons(ETH_P_IPV6)) &&
 		    inet6_sk(sk)->rxopt.all)
 			pingv6_ops.ip6_datagram_recv_specific_ctl(sk, msg, skb);
 		else if (skb->protocol == htons(ETH_P_IP) && isk->cmsg_flags)
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index a9f7eca0b6a3..230249917ffc 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -474,7 +474,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 		sin->sin6_family = AF_INET6;
 		sin->sin6_flowinfo = 0;
 		sin->sin6_port = serr->port;
-		if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 			const struct ipv6hdr *ip6h = container_of((struct in6_addr *)(nh + serr->addr_offset),
 								  struct ipv6hdr, daddr);
 			sin->sin6_addr = ip6h->daddr;
@@ -499,7 +499,7 @@ int ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 		sin->sin6_family = AF_INET6;
 		if (np->rxopt.all)
 			ip6_datagram_recv_common_ctl(sk, msg, skb);
-		if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 			sin->sin6_addr = ipv6_hdr(skb)->saddr;
 			if (np->rxopt.all)
 				ip6_datagram_recv_specific_ctl(sk, msg, skb);
@@ -587,7 +587,7 @@ void ip6_datagram_recv_common_ctl(struct sock *sk, struct msghdr *msg,
 	if (np->rxopt.bits.rxinfo) {
 		struct in6_pktinfo src_info;
 
-		if (is_ipv6) {
+		if (unlikely(is_ipv6)) {
 			src_info.ipi6_ifindex = IP6CB(skb)->iif;
 			src_info.ipi6_addr = ipv6_hdr(skb)->daddr;
 		} else {
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index 375a1881d93d..17c89edcde70 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -10,7 +10,7 @@ static unsigned int
 nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb,
 			  const struct nf_hook_state *state)
 {
-	switch (skb->protocol) {
+	switch (__builtin_expect(skb->protocol, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		return nf_flow_offload_ip_hook(priv, skb, state);
 	case htons(ETH_P_IPV6):
diff --git a/net/netfilter/nf_tables_netdev.c b/net/netfilter/nf_tables_netdev.c
index 4041fafca934..0fc5cc45d238 100644
--- a/net/netfilter/nf_tables_netdev.c
+++ b/net/netfilter/nf_tables_netdev.c
@@ -23,7 +23,7 @@ nft_do_chain_netdev(void *priv, struct sk_buff *skb,
 
 	nft_set_pktinfo(&pkt, skb, state);
 
-	switch (skb->protocol) {
+	switch (__builtin_expect(skb->protocol, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		nft_set_pktinfo_ipv4_validate(&pkt, skb);
 		break;
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 8bba23160a68..9db1303a3c9f 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -774,7 +774,7 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 
 	skb = entry->skb;
 
-	switch (entry->state.pf) {
+	switch (__builtin_expect(entry->state.pf, NFPROTO_IPV4)) {
 	case NFPROTO_IPV4:
 		skb->protocol = htons(ETH_P_IP);
 		break;
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 30a5df27116e..ae7ddba3232b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -909,7 +909,7 @@ static void ovs_fragment(struct net *net, struct vport *vport,
 
 		ip_do_fragment(net, skb->sk, skb, ovs_vport_output);
 		refdst_drop(orig_dst);
-	} else if (key->eth.type == htons(ETH_P_IPV6)) {
+	} else if (unlikely(key->eth.type == htons(ETH_P_IPV6))) {
 		const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
 		unsigned long orig_dst;
 		struct rt6_info ovs_rt;
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index c5904f629091..aeebcb46af8e 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -82,7 +82,7 @@ static void __ovs_ct_free_action(struct ovs_conntrack_info *ct_info);
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
 {
-	switch (ntohs(key->eth.type)) {
+	switch (__builtin_expect(ntohs(key->eth.type), ETH_P_IP)) {
 	case ETH_P_IP:
 		return NFPROTO_IPV4;
 	case ETH_P_IPV6:
@@ -188,7 +188,7 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
 			key->ipv4.ct_orig.dst = orig->dst.u3.ip;
 			__ovs_ct_update_key_orig_tp(key, orig, IPPROTO_ICMP);
 			return;
-		} else if (key->eth.type == htons(ETH_P_IPV6) &&
+		} else if (unlikely(key->eth.type == htons(ETH_P_IPV6)) &&
 			   !sw_flow_key_is_nd(key) &&
 			   nf_ct_l3num(ct) == NFPROTO_IPV6) {
 			key->ipv6.ct_orig.src = orig->src.u3.in6;
@@ -289,7 +289,7 @@ int ovs_ct_put_key(const struct sw_flow_key *swkey,
 			if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,
 				    sizeof(orig), &orig))
 				return -EMSGSIZE;
-		} else if (swkey->eth.type == htons(ETH_P_IPV6)) {
+		} else if (unlikely(swkey->eth.type == htons(ETH_P_IPV6))) {
 			struct ovs_key_ct_tuple_ipv6 orig = {
 				IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.src),
 				IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst),
@@ -484,7 +484,7 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key,
 
 		ovs_cb.mru = IPCB(skb)->frag_max_size;
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
-	} else if (key->eth.type == htons(ETH_P_IPV6)) {
+	} else if (unlikely(key->eth.type == htons(ETH_P_IPV6))) {
 		enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
 
 		memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
@@ -735,7 +735,7 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 				err = NF_DROP;
 			goto push;
 		} else if (IS_ENABLED(CONFIG_NF_NAT_IPV6) &&
-			   skb->protocol == htons(ETH_P_IPV6)) {
+			   unlikely(skb->protocol == htons(ETH_P_IPV6))) {
 			__be16 frag_off;
 			u8 nexthdr = ipv6_hdr(skb)->nexthdr;
 			int hdrlen = ipv6_skip_exthdr(skb,
@@ -797,7 +797,7 @@ static void ovs_nat_update_key(struct sw_flow_key *key,
 		key->ct_state |= OVS_CS_F_SRC_NAT;
 		if (key->eth.type == htons(ETH_P_IP))
 			key->ipv4.addr.src = ip_hdr(skb)->saddr;
-		else if (key->eth.type == htons(ETH_P_IPV6))
+		else if (unlikely(key->eth.type == htons(ETH_P_IPV6)))
 			memcpy(&key->ipv6.addr.src, &ipv6_hdr(skb)->saddr,
 			       sizeof(key->ipv6.addr.src));
 		else
@@ -819,7 +819,7 @@ static void ovs_nat_update_key(struct sw_flow_key *key,
 		key->ct_state |= OVS_CS_F_DST_NAT;
 		if (key->eth.type == htons(ETH_P_IP))
 			key->ipv4.addr.dst = ip_hdr(skb)->daddr;
-		else if (key->eth.type == htons(ETH_P_IPV6))
+		else if (unlikely(key->eth.type == htons(ETH_P_IPV6)))
 			memcpy(&key->ipv6.addr.dst, &ipv6_hdr(skb)->daddr,
 			       sizeof(key->ipv6.addr.dst));
 		else
@@ -1109,7 +1109,7 @@ static int ovs_skb_network_trim(struct sk_buff *skb)
 	unsigned int len;
 	int err;
 
-	switch (skb->protocol) {
+	switch (__builtin_expect(skb->protocol, ETH_P_IP)) {
 	case htons(ETH_P_IP):
 		len = ntohs(ip_hdr(skb)->tot_len);
 		break;
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 56b8e7167790..f959364c29e8 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -735,7 +735,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 
 			stack_len += MPLS_HLEN;
 		}
-	} else if (key->eth.type == htons(ETH_P_IPV6)) {
+	} else if (unlikely(key->eth.type == htons(ETH_P_IPV6))) {
 		int nh_len;             /* IPv6 Header + Extensions */
 
 		nh_len = parse_ipv6hdr(skb, key);
@@ -910,7 +910,7 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 	    key->eth.type != htons(ETH_P_IP))
 		return -EINVAL;
 	if (attrs & (1 << OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6) &&
-	    (key->eth.type != htons(ETH_P_IPV6) ||
+	    (likely(key->eth.type != htons(ETH_P_IPV6)) ||
 	     sw_flow_key_is_nd(key)))
 		return -EINVAL;
 
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index c670dd24b8b7..63d280c42f10 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -165,7 +165,7 @@ struct sw_flow_key {
 
 static inline bool sw_flow_key_is_nd(const struct sw_flow_key *key)
 {
-	return key->eth.type == htons(ETH_P_IPV6) &&
+	return unlikely(key->eth.type == htons(ETH_P_IPV6)) &&
 		key->ip.proto == NEXTHDR_ICMP &&
 		key->tp.dst == 0 &&
 		(key->tp.src == htons(NDISC_NEIGHBOUR_SOLICITATION) ||
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 7322aa1e382e..33ba451efbf6 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -238,7 +238,7 @@ static bool match_validate(const struct sw_flow_match *match,
 		}
 	}
 
-	if (match->key->eth.type == htons(ETH_P_IPV6)) {
+	if (unlikely(match->key->eth.type == htons(ETH_P_IPV6))) {
 		key_expected |= 1 << OVS_KEY_ATTR_IPV6;
 		if (match->mask && match->mask->key.eth.type == htons(0xffff)) {
 			mask_allowed |= 1 << OVS_KEY_ATTR_IPV6;
@@ -2070,7 +2070,7 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 		ipv4_key->ipv4_tos = output->ip.tos;
 		ipv4_key->ipv4_ttl = output->ip.ttl;
 		ipv4_key->ipv4_frag = output->ip.frag;
-	} else if (swkey->eth.type == htons(ETH_P_IPV6)) {
+	} else if (unlikely(swkey->eth.type == htons(ETH_P_IPV6))) {
 		struct ovs_key_ipv6 *ipv6_key;
 
 		nla = nla_reserve(skb, OVS_KEY_ATTR_IPV6, sizeof(*ipv6_key));
@@ -2114,7 +2114,7 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 	}
 
 	if ((swkey->eth.type == htons(ETH_P_IP) ||
-	     swkey->eth.type == htons(ETH_P_IPV6)) &&
+	     unlikely(swkey->eth.type == htons(ETH_P_IPV6))) &&
 	     swkey->ip.frag != OVS_FRAG_TYPE_LATER) {
 
 		if (swkey->ip.proto == IPPROTO_TCP) {
@@ -2157,7 +2157,7 @@ static int __ovs_nla_put_key(const struct sw_flow_key *swkey,
 			icmp_key = nla_data(nla);
 			icmp_key->icmp_type = ntohs(output->tp.src);
 			icmp_key->icmp_code = ntohs(output->tp.dst);
-		} else if (swkey->eth.type == htons(ETH_P_IPV6) &&
+		} else if (unlikely(swkey->eth.type == htons(ETH_P_IPV6)) &&
 			   swkey->ip.proto == IPPROTO_ICMPV6) {
 			struct ovs_key_icmpv6 *icmpv6_key;
 
@@ -2682,7 +2682,7 @@ static int validate_set(const struct nlattr *a,
 		break;
 
 	case OVS_KEY_ATTR_IPV6:
-		if (eth_type != htons(ETH_P_IPV6))
+		if (likely(eth_type != htons(ETH_P_IPV6)))
 			return -EINVAL;
 
 		ipv6_key = nla_data(ovs_key);
@@ -2711,7 +2711,7 @@ static int validate_set(const struct nlattr *a,
 
 	case OVS_KEY_ATTR_TCP:
 		if ((eth_type != htons(ETH_P_IP) &&
-		     eth_type != htons(ETH_P_IPV6)) ||
+		     likely(eth_type != htons(ETH_P_IPV6))) ||
 		    flow_key->ip.proto != IPPROTO_TCP)
 			return -EINVAL;
 
@@ -2719,7 +2719,7 @@ static int validate_set(const struct nlattr *a,
 
 	case OVS_KEY_ATTR_UDP:
 		if ((eth_type != htons(ETH_P_IP) &&
-		     eth_type != htons(ETH_P_IPV6)) ||
+		     likely(eth_type != htons(ETH_P_IPV6))) ||
 		    flow_key->ip.proto != IPPROTO_UDP)
 			return -EINVAL;
 
@@ -2732,7 +2732,7 @@ static int validate_set(const struct nlattr *a,
 
 	case OVS_KEY_ATTR_SCTP:
 		if ((eth_type != htons(ETH_P_IP) &&
-		     eth_type != htons(ETH_P_IPV6)) ||
+		     likely(eth_type != htons(ETH_P_IPV6))) ||
 		    flow_key->ip.proto != IPPROTO_SCTP)
 			return -EINVAL;
 
@@ -2924,7 +2924,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			 */
 			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
 			    (eth_type != htons(ETH_P_IP) &&
-			     eth_type != htons(ETH_P_IPV6) &&
+			     likely(eth_type != htons(ETH_P_IPV6)) &&
 			     eth_type != htons(ETH_P_ARP) &&
 			     eth_type != htons(ETH_P_RARP) &&
 			     !eth_p_mpls(eth_type)))
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 89b178a78dc7..870cd06adbef 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -279,7 +279,7 @@ void xfrm_local_error(struct sk_buff *skb, int mtu)
 
 	if (skb->protocol == htons(ETH_P_IP))
 		proto = AF_INET;
-	else if (skb->protocol == htons(ETH_P_IPV6))
+	else if (unlikely(skb->protocol == htons(ETH_P_IPV6)))
 		proto = AF_INET6;
 	else
 		return;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: improve ipv4 performances
  2018-04-01 18:31 [PATCH] net: improve ipv4 performances Anton Gary Ceph
@ 2018-04-01 18:50 ` Stephen Hemminger
  2018-04-02  0:51   ` Md. Islam
  2018-04-02  4:49 ` Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2018-04-01 18:50 UTC (permalink / raw)
  To: Anton Gary Ceph; +Cc: netdev, linux-kernel

On Sun,  1 Apr 2018 20:31:21 +0200
Anton Gary Ceph <agaceph@gmail.com> wrote:

> As the Linux networking stack is growing, more and more protocols are
> added, increasing the complexity of stack itself.
> Modern processors, contrary to common belief, are very bad in branch
> prediction, so it's our task to give hints to the compiler when possible.
> 
> After a few profiling and analysis, turned out that the ethertype field
> of the packets has the following distribution:
> 
>     92.1% ETH_P_IP
>      3.2% ETH_P_ARP
>      2.7% ETH_P_8021Q
>      1.4% ETH_P_PPP_SES
>      0.6% don't know/no opinion
> 
> From a projection on statistics collected by Google about IPv6 adoption[1],
> IPv6 should peak at 25% usage at the beginning of 2030. Hence, we should
> give proper hints to the compiler about the low IPv6 usage.
> 
> Here is an iperf3 run before and after the patch:
> 
> Before:
> [ ID]  Interval           Transfer    Bandwidth       Retr
> [  4]  0.00-100.00 sec    100 GBytes  8.60 Gbits/sec  0       sender
> [  4]  0.00-100.00 sec    100 GBytes  8.60 Gbits/sec          receiver
> 
> After
> [ ID]  Interval           Transfer    Bandwidth       Retr
> [  4]  0.00-100.00 sec    109 GBytes  9.35 Gbits/sec  0       sender
> [  4]  0.00-100.00 sec    109 GBytes  9.35 Gbits/sec          receiver
> 
> [1] https://www.google.com/intl/en/ipv6/statistics.html
> 
> Signed-off-by: Anton Gary Ceph <agaceph@gmail.com>

I am surprised it makes that much of an impact.

It would be easier to manage future bisection if the big patch
was split into several pieces. Bridge,  bonding, netfilter, etc.
There doesn't appear to be any direct cross dependencies.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: improve ipv4 performances
  2018-04-01 18:50 ` Stephen Hemminger
@ 2018-04-02  0:51   ` Md. Islam
  0 siblings, 0 replies; 7+ messages in thread
From: Md. Islam @ 2018-04-02  0:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Anton Gary Ceph, netdev, linux-kernel

Yes, I'm also seeing good performance improvement after adding
likely() and prefetch().

On Sun, Apr 1, 2018 at 2:50 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Sun,  1 Apr 2018 20:31:21 +0200
> Anton Gary Ceph <agaceph@gmail.com> wrote:
>
>> As the Linux networking stack is growing, more and more protocols are
>> added, increasing the complexity of stack itself.
>> Modern processors, contrary to common belief, are very bad in branch
>> prediction, so it's our task to give hints to the compiler when possible.
>>
>> After a few profiling and analysis, turned out that the ethertype field
>> of the packets has the following distribution:
>>
>>     92.1% ETH_P_IP
>>      3.2% ETH_P_ARP
>>      2.7% ETH_P_8021Q
>>      1.4% ETH_P_PPP_SES
>>      0.6% don't know/no opinion
>>
>> From a projection on statistics collected by Google about IPv6 adoption[1],
>> IPv6 should peak at 25% usage at the beginning of 2030. Hence, we should
>> give proper hints to the compiler about the low IPv6 usage.
>>
>> Here is an iperf3 run before and after the patch:
>>
>> Before:
>> [ ID]  Interval           Transfer    Bandwidth       Retr
>> [  4]  0.00-100.00 sec    100 GBytes  8.60 Gbits/sec  0       sender
>> [  4]  0.00-100.00 sec    100 GBytes  8.60 Gbits/sec          receiver
>>
>> After
>> [ ID]  Interval           Transfer    Bandwidth       Retr
>> [  4]  0.00-100.00 sec    109 GBytes  9.35 Gbits/sec  0       sender
>> [  4]  0.00-100.00 sec    109 GBytes  9.35 Gbits/sec          receiver
>>
>> [1] https://www.google.com/intl/en/ipv6/statistics.html
>>
>> Signed-off-by: Anton Gary Ceph <agaceph@gmail.com>
>
> I am surprised it makes that much of an impact.
>
> It would be easier to manage future bisection if the big patch
> was split into several pieces. Bridge,  bonding, netfilter, etc.
> There doesn't appear to be any direct cross dependencies.
>
>



-- 
Tamim
PhD Candidate,
Kent State University
http://web.cs.kent.edu/~mislam4/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: improve ipv4 performances
  2018-04-01 18:31 [PATCH] net: improve ipv4 performances Anton Gary Ceph
  2018-04-01 18:50 ` Stephen Hemminger
@ 2018-04-02  4:49 ` Eric Dumazet
  2018-04-02  7:57 ` kbuild test robot
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2018-04-02  4:49 UTC (permalink / raw)
  To: Anton Gary Ceph, netdev, linux-kernel



On 04/01/2018 11:31 AM, Anton Gary Ceph wrote:
> As the Linux networking stack is growing, more and more protocols are
> added, increasing the complexity of stack itself.
> Modern processors, contrary to common belief, are very bad in branch
> prediction, so it's our task to give hints to the compiler when possible.
> 
> After a few profiling and analysis, turned out that the ethertype field
> of the packets has the following distribution:
> 
>     92.1% ETH_P_IP
>      3.2% ETH_P_ARP
>      2.7% ETH_P_8021Q
>      1.4% ETH_P_PPP_SES
>      0.6% don't know/no opinion
> 
> From a projection on statistics collected by Google about IPv6 adoption[1],
> IPv6 should peak at 25% usage at the beginning of 2030. Hence, we should
> give proper hints to the compiler about the low IPv6 usage.
> 
> Here is an iperf3 run before and after the patch:
> 
> Before:
> [ ID]  Interval           Transfer    Bandwidth       Retr
> [  4]  0.00-100.00 sec    100 GBytes  8.60 Gbits/sec  0       sender
> [  4]  0.00-100.00 sec    100 GBytes  8.60 Gbits/sec          receiver
> 
> After
> [ ID]  Interval           Transfer    Bandwidth       Retr
> [  4]  0.00-100.00 sec    109 GBytes  9.35 Gbits/sec  0       sender
> [  4]  0.00-100.00 sec    109 GBytes  9.35 Gbits/sec          receiver
>

These iperf3 numbers are simply telling something is wrong in your measures or your hardware.

By the time linux kernels with this patch reach hosts, they will likely use IPv6 anyway.

Please do not tell the compiler that IPv6 should be slowed down in favor of IPv4.

Instead, work on removing IPv4 stack from linux kernel (making it a module)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: improve ipv4 performances
  2018-04-01 18:31 [PATCH] net: improve ipv4 performances Anton Gary Ceph
  2018-04-01 18:50 ` Stephen Hemminger
  2018-04-02  4:49 ` Eric Dumazet
@ 2018-04-02  7:57 ` kbuild test robot
  2018-04-03 14:18 ` Douglas Caetano dos Santos
  2018-04-04 12:34 ` Paolo Abeni
  4 siblings, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2018-04-02  7:57 UTC (permalink / raw)
  To: Anton Gary Ceph; +Cc: kbuild-all, netdev, linux-kernel

Hi Anton,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]
[also build test WARNING on v4.16 next-20180329]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Anton-Gary-Ceph/net-improve-ipv4-performances/20180402-103807
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/bridge/br_private.h:690:15: sparse: restricted __be16 degrades to integer
   net/bridge/br_private.h:694:15: sparse: restricted __be16 degrades to integer
--
>> net/bridge/br_multicast.c:66:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:69:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:96:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:99:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:171:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:175:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:96:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:99:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:581:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:584:14: sparse: restricted __be16 degrades to integer
>> net/bridge/br_multicast.c:66:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:69:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:96:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:99:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:96:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:99:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:1325:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:1328:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:1765:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:1769:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:1913:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:1917:14: sparse: restricted __be16 degrades to integer
>> net/bridge/br_private.h:690:15: sparse: restricted __be16 degrades to integer
   net/bridge/br_private.h:694:15: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:2497:14: sparse: restricted __be16 degrades to integer
   net/bridge/br_multicast.c:2532:14: sparse: restricted __be16 degrades to integer
--
   net/core/filter.c:318:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:321:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:324:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:327:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:330:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:1184:39: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct sock_filter const *filter @@    got struct sockstruct sock_filter const *filter @@
   net/core/filter.c:1184:39:    expected struct sock_filter const *filter
   net/core/filter.c:1184:39:    got struct sock_filter [noderef] <asn:1>*filter
   net/core/filter.c:1286:39: sparse: incorrect type in argument 1 (different address spaces) @@    expected struct sock_filter const *filter @@    got struct sockstruct sock_filter const *filter @@
   net/core/filter.c:1286:39:    expected struct sock_filter const *filter
   net/core/filter.c:1286:39:    got struct sock_filter [noderef] <asn:1>*filter
   net/core/filter.c:1547:43: sparse: incorrect type in argument 2 (different base types) @@    expected restricted __wsum [usertype] diff @@    got unsigned lonrestricted __wsum [usertype] diff @@
   net/core/filter.c:1547:43:    expected restricted __wsum [usertype] diff
   net/core/filter.c:1547:43:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1550:36: sparse: incorrect type in argument 2 (different base types) @@    expected restricted __be16 [usertype] old @@    got unsigned lonrestricted __be16 [usertype] old @@
   net/core/filter.c:1550:36:    expected restricted __be16 [usertype] old
   net/core/filter.c:1550:36:    got unsigned long long [unsigned] [usertype] from
   net/core/filter.c:1550:42: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __be16 [usertype] new @@    got unsigned lonrestricted __be16 [usertype] new @@
   net/core/filter.c:1550:42:    expected restricted __be16 [usertype] new
   net/core/filter.c:1550:42:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1553:36: sparse: incorrect type in argument 2 (different base types) @@    expected restricted __be32 [usertype] from @@    got unsigned lonrestricted __be32 [usertype] from @@
   net/core/filter.c:1553:36:    expected restricted __be32 [usertype] from
   net/core/filter.c:1553:36:    got unsigned long long [unsigned] [usertype] from
   net/core/filter.c:1553:42: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __be32 [usertype] to @@    got unsigned lonrestricted __be32 [usertype] to @@
   net/core/filter.c:1553:42:    expected restricted __be32 [usertype] to
   net/core/filter.c:1553:42:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1598:59: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __wsum [usertype] diff @@    got unsigned lonrestricted __wsum [usertype] diff @@
   net/core/filter.c:1598:59:    expected restricted __wsum [usertype] diff
   net/core/filter.c:1598:59:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1601:52: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __be16 [usertype] from @@    got unsigned lonrestricted __be16 [usertype] from @@
   net/core/filter.c:1601:52:    expected restricted __be16 [usertype] from
   net/core/filter.c:1601:52:    got unsigned long long [unsigned] [usertype] from
   net/core/filter.c:1601:58: sparse: incorrect type in argument 4 (different base types) @@    expected restricted __be16 [usertype] to @@    got unsigned lonrestricted __be16 [usertype] to @@
   net/core/filter.c:1601:58:    expected restricted __be16 [usertype] to
   net/core/filter.c:1601:58:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1604:52: sparse: incorrect type in argument 3 (different base types) @@    expected restricted __be32 [usertype] from @@    got unsigned lonrestricted __be32 [usertype] from @@
   net/core/filter.c:1604:52:    expected restricted __be32 [usertype] from
   net/core/filter.c:1604:52:    got unsigned long long [unsigned] [usertype] from
   net/core/filter.c:1604:58: sparse: incorrect type in argument 4 (different base types) @@    expected restricted __be32 [usertype] to @@    got unsigned lonrestricted __be32 [usertype] to @@
   net/core/filter.c:1604:58:    expected restricted __be32 [usertype] to
   net/core/filter.c:1604:58:    got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1650:28: sparse: incorrect type in return expression (different base types) @@    expected unsigned long long @@    got nsigned long long @@
   net/core/filter.c:1650:28:    expected unsigned long long
   net/core/filter.c:1650:28:    got restricted __wsum
   net/core/filter.c:1672:35: sparse: incorrect type in return expression (different base types) @@    expected unsigned long long @@    got restricted unsigned long long @@
   net/core/filter.c:1672:35:    expected unsigned long long
   net/core/filter.c:1672:35:    got restricted __wsum [usertype] csum
>> net/core/filter.c:2244:14: sparse: restricted __be16 degrades to integer
   net/core/filter.c:2246:14: sparse: restricted __be16 degrades to integer
--
>> include/linux/netdevice.h:4035:14: sparse: restricted __be16 degrades to integer
   include/linux/netdevice.h:4037:14: sparse: restricted __be16 degrades to integer
>> net/core/skbuff.c:4646:14: sparse: restricted __be16 degrades to integer
   net/core/skbuff.c:4650:14: sparse: restricted __be16 degrades to integer
--
>> include/net/netfilter/nf_queue.h:83:14: sparse: restricted __be16 degrades to integer
   include/net/netfilter/nf_queue.h:89:14: sparse: restricted __be16 degrades to integer
>> include/net/netfilter/nf_queue.h:83:14: sparse: restricted __be16 degrades to integer
   include/net/netfilter/nf_queue.h:89:14: sparse: restricted __be16 degrades to integer
--
>> net/netfilter/nf_tables_netdev.c:27:14: sparse: restricted __be16 degrades to integer
   net/netfilter/nf_tables_netdev.c:30:14: sparse: restricted __be16 degrades to integer
--
>> include/net/netfilter/nf_queue.h:83:14: sparse: restricted __be16 degrades to integer
   include/net/netfilter/nf_queue.h:89:14: sparse: restricted __be16 degrades to integer
--
>> net/netfilter/nf_flow_table_inet.c:14:14: sparse: restricted __be16 degrades to integer
   net/netfilter/nf_flow_table_inet.c:16:14: sparse: restricted __be16 degrades to integer
--
>> net/openvswitch/conntrack.c:1113:14: sparse: restricted __be16 degrades to integer
   net/openvswitch/conntrack.c:1116:14: sparse: restricted __be16 degrades to integer

vim +690 net/bridge/br_private.h

cc0fdd80 Linus Lüssing       2013-08-30  685  
cc0fdd80 Linus Lüssing       2013-08-30  686  static inline bool br_multicast_querier_exists(struct net_bridge *br,
cc0fdd80 Linus Lüssing       2013-08-30  687  					       struct ethhdr *eth)
b00589af Linus Lüssing       2013-08-01  688  {
f9ba1e10 Anton Gary Ceph     2018-04-01  689  	switch (__builtin_expect(eth->h_proto, ETH_P_IP)) {
cc0fdd80 Linus Lüssing       2013-08-30 @690  	case (htons(ETH_P_IP)):
0888d5f3 daniel              2016-06-24  691  		return __br_multicast_querier_exists(br,
0888d5f3 daniel              2016-06-24  692  			&br->ip4_other_query, false);
cc0fdd80 Linus Lüssing       2013-08-30  693  #if IS_ENABLED(CONFIG_IPV6)
cc0fdd80 Linus Lüssing       2013-08-30  694  	case (htons(ETH_P_IPV6)):
0888d5f3 daniel              2016-06-24  695  		return __br_multicast_querier_exists(br,
0888d5f3 daniel              2016-06-24  696  			&br->ip6_other_query, true);
cc0fdd80 Linus Lüssing       2013-08-30  697  #endif
cc0fdd80 Linus Lüssing       2013-08-30  698  	default:
cc0fdd80 Linus Lüssing       2013-08-30  699  		return false;
cc0fdd80 Linus Lüssing       2013-08-30  700  	}
b00589af Linus Lüssing       2013-08-01  701  }
1080ab95 Nikolay Aleksandrov 2016-06-28  702  

:::::: The code at line 690 was first introduced by commit
:::::: cc0fdd802859eaeb00e1c87dbb655594bed2844c bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones

:::::: TO: Linus Lüssing <linus.luessing@web.de>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: improve ipv4 performances
  2018-04-01 18:31 [PATCH] net: improve ipv4 performances Anton Gary Ceph
                   ` (2 preceding siblings ...)
  2018-04-02  7:57 ` kbuild test robot
@ 2018-04-03 14:18 ` Douglas Caetano dos Santos
  2018-04-04 12:34 ` Paolo Abeni
  4 siblings, 0 replies; 7+ messages in thread
From: Douglas Caetano dos Santos @ 2018-04-03 14:18 UTC (permalink / raw)
  To: Anton Gary Ceph, netdev, linux-kernel

Hi Anton, everyone,

On 04/01/18 15:31, Anton Gary Ceph wrote:
> As the Linux networking stack is growing, more and more protocols are
> added, increasing the complexity of stack itself.
> Modern processors, contrary to common belief, are very bad in branch
> prediction, so it's our task to give hints to the compiler when possible.
> 
> After a few profiling and analysis, turned out that the ethertype field
> of the packets has the following distribution:
> 
>     92.1% ETH_P_IP
>      3.2% ETH_P_ARP
>      2.7% ETH_P_8021Q
>      1.4% ETH_P_PPP_SES
>      0.6% don't know/no opinion
> 
> From a projection on statistics collected by Google about IPv6 adoption[1],
> IPv6 should peak at 25% usage at the beginning of 2030. Hence, we should
> give proper hints to the compiler about the low IPv6 usage.

My two cents on the matter:

You should not consider favoring some parts of code in detriment of another just because of one use case. In your patch, you're considering one server that attends for IPv4 and IPv6 connections simultaneously, in a proportion seen on the Internet, but you completely disregard the use cases of servers that could serve, for example, only IPv6. What about those, just let them slow down?

What I think about such hints and optimizations - someone correct me if I'm wrong - is that they should be done not with specific use cases in mind, but according to the code flow in general. For example, it could be a good idea to slow down ARP requests, because there is AFAIK not such a server that attends only ARP (not that I'm advocating for it, just using as an example). But slowing down IPv6, as Eric already said, is utterly non-sense.

Again, "low IPv6 usage" doesn't mean code that is barely touched, with an IPv6-only server being the obvious example.

-- 
Douglas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] net: improve ipv4 performances
  2018-04-01 18:31 [PATCH] net: improve ipv4 performances Anton Gary Ceph
                   ` (3 preceding siblings ...)
  2018-04-03 14:18 ` Douglas Caetano dos Santos
@ 2018-04-04 12:34 ` Paolo Abeni
  4 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2018-04-04 12:34 UTC (permalink / raw)
  To: Anton Gary Ceph, netdev, linux-kernel

On Sun, 2018-04-01 at 20:31 +0200, Anton Gary Ceph wrote:
> After a few profiling and analysis, turned out that the ethertype field
> of the packets has the following distribution:
[...]
>      0.6% don't know/no opinion

Am I the only one finding the submission date and the above info
suspicious ?!?

/P

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-04 12:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-01 18:31 [PATCH] net: improve ipv4 performances Anton Gary Ceph
2018-04-01 18:50 ` Stephen Hemminger
2018-04-02  0:51   ` Md. Islam
2018-04-02  4:49 ` Eric Dumazet
2018-04-02  7:57 ` kbuild test robot
2018-04-03 14:18 ` Douglas Caetano dos Santos
2018-04-04 12:34 ` Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).