linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Robert Shearman <rshearma@brocade.com>,
	David Ahern <dsa@cumulusnetworks.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.9 26/51] net: mpls: Fix multipath selection for LSR use case
Date: Thu,  2 Feb 2017 19:37:45 +0100	[thread overview]
Message-ID: <20170202183346.445601835@linuxfoundation.org> (raw)
In-Reply-To: <20170202183345.067336143@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: David Ahern <dsa@cumulusnetworks.com>


[ Upstream commit 9f427a0e474a67b454420c131709600d44850486 ]

MPLS multipath for LSR is broken -- always selecting the first nexthop
in the one label case. For example:

    $ ip -f mpls ro ls
    100
            nexthop as to 200 via inet 172.16.2.2  dev virt12
            nexthop as to 300 via inet 172.16.3.2  dev virt13
    101
            nexthop as to 201 via inet6 2000:2::2  dev virt12
            nexthop as to 301 via inet6 2000:3::2  dev virt13

In this example incoming packets have a single MPLS labels which means
BOS bit is set. The BOS bit is passed from mpls_forward down to
mpls_multipath_hash which never processes the hash loop because BOS is 1.

Update mpls_multipath_hash to process the entire label stack. mpls_hdr_len
tracks the total mpls header length on each pass (on pass N mpls_hdr_len
is N * sizeof(mpls_shim_hdr)). When the label is found with the BOS set
it verifies the skb has sufficient header for ipv4 or ipv6, and find the
IPv4 and IPv6 header by using the last mpls_hdr pointer and adding 1 to
advance past it.

With these changes I have verified the code correctly sees the label,
BOS, IPv4 and IPv6 addresses in the network header and icmp/tcp/udp
traffic for ipv4 and ipv6 are distributed across the nexthops.

Fixes: 1c78efa8319ca ("mpls: flow-based multipath selection")
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/mpls/af_mpls.c |   48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -98,18 +98,19 @@ bool mpls_pkt_too_big(const struct sk_bu
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static u32 mpls_multipath_hash(struct mpls_route *rt,
-			       struct sk_buff *skb, bool bos)
+static u32 mpls_multipath_hash(struct mpls_route *rt, struct sk_buff *skb)
 {
 	struct mpls_entry_decoded dec;
+	unsigned int mpls_hdr_len = 0;
 	struct mpls_shim_hdr *hdr;
 	bool eli_seen = false;
 	int label_index;
 	u32 hash = 0;
 
-	for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+	for (label_index = 0; label_index < MAX_MP_SELECT_LABELS;
 	     label_index++) {
-		if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+		mpls_hdr_len += sizeof(*hdr);
+		if (!pskb_may_pull(skb, mpls_hdr_len))
 			break;
 
 		/* Read and decode the current label */
@@ -134,37 +135,38 @@ static u32 mpls_multipath_hash(struct mp
 			eli_seen = true;
 		}
 
-		bos = dec.bos;
-		if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
-					 sizeof(struct iphdr))) {
+		if (!dec.bos)
+			continue;
+
+		/* found bottom label; does skb have room for a header? */
+		if (pskb_may_pull(skb, mpls_hdr_len + sizeof(struct iphdr))) {
 			const struct iphdr *v4hdr;
 
-			v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
-						       label_index);
+			v4hdr = (const struct iphdr *)(hdr + 1);
 			if (v4hdr->version == 4) {
 				hash = jhash_3words(ntohl(v4hdr->saddr),
 						    ntohl(v4hdr->daddr),
 						    v4hdr->protocol, hash);
 			} else if (v4hdr->version == 6 &&
-				pskb_may_pull(skb, sizeof(*hdr) * label_index +
-					      sizeof(struct ipv6hdr))) {
+				   pskb_may_pull(skb, mpls_hdr_len +
+						 sizeof(struct ipv6hdr))) {
 				const struct ipv6hdr *v6hdr;
 
-				v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) +
-								label_index);
-
+				v6hdr = (const struct ipv6hdr *)(hdr + 1);
 				hash = __ipv6_addr_jhash(&v6hdr->saddr, hash);
 				hash = __ipv6_addr_jhash(&v6hdr->daddr, hash);
 				hash = jhash_1word(v6hdr->nexthdr, hash);
 			}
 		}
+
+		break;
 	}
 
 	return hash;
 }
 
 static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
-					     struct sk_buff *skb, bool bos)
+					     struct sk_buff *skb)
 {
 	int alive = ACCESS_ONCE(rt->rt_nhn_alive);
 	u32 hash = 0;
@@ -180,7 +182,7 @@ static struct mpls_nh *mpls_select_multi
 	if (alive <= 0)
 		return NULL;
 
-	hash = mpls_multipath_hash(rt, skb, bos);
+	hash = mpls_multipath_hash(rt, skb);
 	nh_index = hash % alive;
 	if (alive == rt->rt_nhn)
 		goto out;
@@ -278,17 +280,11 @@ static int mpls_forward(struct sk_buff *
 	hdr = mpls_hdr(skb);
 	dec = mpls_entry_decode(hdr);
 
-	/* Pop the label */
-	skb_pull(skb, sizeof(*hdr));
-	skb_reset_network_header(skb);
-
-	skb_orphan(skb);
-
 	rt = mpls_route_input_rcu(net, dec.label);
 	if (!rt)
 		goto drop;
 
-	nh = mpls_select_multipath(rt, skb, dec.bos);
+	nh = mpls_select_multipath(rt, skb);
 	if (!nh)
 		goto drop;
 
@@ -297,6 +293,12 @@ static int mpls_forward(struct sk_buff *
 	if (!mpls_output_possible(out_dev))
 		goto drop;
 
+	/* Pop the label */
+	skb_pull(skb, sizeof(*hdr));
+	skb_reset_network_header(skb);
+
+	skb_orphan(skb);
+
 	if (skb_warn_if_lro(skb))
 		goto drop;
 

  parent reply	other threads:[~2017-02-02 18:39 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-02 18:37 [PATCH 4.9 00/51] 4.9.8-stable review Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 01/51] r8152: fix the sw rx checksum is unavailable Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 02/51] netvsc: add rcu_read locking to netvsc callback Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 03/51] mlxsw: spectrum: Fix memory leak at skb reallocation Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 04/51] mlxsw: switchx2: " Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 05/51] mlxsw: pci: Fix EQE structure definition Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 06/51] net: lwtunnel: Handle lwtunnel_fill_encap failure Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 07/51] net: ipv4: fix table id in getroute response Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 08/51] net: systemport: Decouple flow control from __bcm_sysport_tx_reclaim Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 09/51] tcp: fix tcp_fastopen unaligned access complaints on sparc Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 10/51] openvswitch: maintain correct checksum state in conntrack actions Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 11/51] mlx4: do not call napi_schedule() without care Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 12/51] ravb: do not use zero-length alignment DMA descriptor Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 13/51] ip6_tunnel: Account for tunnel header in tunnel MTU Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 14/51] ax25: Fix segfault after sock connection timeout Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 15/51] net sched actions: fix refcnt when GETing of action after bind Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 16/51] virtio: dont set VIRTIO_NET_HDR_F_DATA_VALID on xmit Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 17/51] virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 18/51] vxlan: fix byte order of vxlan-gpe port number Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 19/51] net: fix harmonize_features() vs NETIF_F_HIGHDMA Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 20/51] net: phy: bcm63xx: Utilize correct config_intr function Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 21/51] lwtunnel: fix autoload of lwt modules Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 22/51] ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 23/51] tcp: initialize max window for a new fastopen socket Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 24/51] net/mlx5e: Do not recycle pages from emergency reserve Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 25/51] bridge: netlink: call br_changelink() during br_dev_newlink() Greg Kroah-Hartman
2017-02-02 18:37 ` Greg Kroah-Hartman [this message]
2017-02-02 18:37 ` [PATCH 4.9 27/51] r8152: dont execute runtime suspend if the tx is not empty Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 28/51] af_unix: move unix_mknod() out of bindlock Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 30/51] net: Specify the owning module for lwtunnel ops Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 31/51] lwtunnel: Fix oops on state free after encap module unload Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 32/51] net: dsa: Bring back device detaching in dsa_slave_suspend() Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 33/51] xfs: bump up reserved blocks in xfs_alloc_set_aside Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 34/51] xfs: fix bogus minleft manipulations Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 35/51] xfs: adjust allocation length in xfs_alloc_space_available Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 36/51] xfs: dont rely on ->total " Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 37/51] xfs: dont print warnings when xfs_log_force fails Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 38/51] xfs: make the ASSERT() condition likely Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 39/51] xfs: sanity check directory inode di_size Greg Kroah-Hartman
2017-02-02 18:37 ` [PATCH 4.9 40/51] xfs: add missing include dependencies to xfs_dir2.h Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 41/51] xfs: replace xfs_mode_to_ftype table with switch statement Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 42/51] xfs: sanity check inode mode when creating new dentry Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 43/51] xfs: sanity check inode di_mode Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 44/51] xfs: dont wrap ID in xfs_dq_get_next_id Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 45/51] xfs: fix xfs_mode_to_ftype() prototype Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 46/51] xfs: fix COW writeback race Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 47/51] xfs: verify dirblocklog correctly Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 48/51] xfs: remove racy hasattr check from attr ops Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 49/51] xfs: extsize hints are not unlikely in xfs_bmap_btalloc Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 50/51] xfs: clear _XBF_PAGES from buffers when readahead page Greg Kroah-Hartman
2017-02-02 18:38 ` [PATCH 4.9 51/51] xfs: fix bmv_count confusion w/ shared extents Greg Kroah-Hartman
2017-02-02 20:38 ` [PATCH 4.9 00/51] 4.9.8-stable review Shuah Khan
2017-02-02 20:56   ` Greg Kroah-Hartman
2017-02-03  5:14 ` Guenter Roeck
2017-02-03  7:17   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170202183346.445601835@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=dsa@cumulusnetworks.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rshearma@brocade.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).