[PATCHES] Networking

From: David Miller <davem@davemloft.net>
To: stable@vger.kernel.org
Subject: [PATCHES] Networking
Date: Sun, 28 Jan 2018 11:22:33 -0500 (EST)	[thread overview]
Message-ID: <20180128.112233.1969044218458343617.davem@davemloft.net> (raw)

[-- Attachment #1: Type: Text/Plain, Size: 106 bytes --]


Please queue up the following networking bug fixes for v4.9
and v4.14 -stable, respectively.

Thank you.

[-- Attachment #2: net_49.mbox --]
[-- Type: Application/Octet-Stream, Size: 57900 bytes --]

From ac81f45d2e3dae761d383d5c05718d0c0404dd2b Mon Sep 17 00:00:00 2001
From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Fri, 26 Jan 2018 15:14:16 +0300
Subject: [PATCH 01/23] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in
 closed state

[ Upstream commit dd5684ecae3bd8e44b644f50e2c12c7e57fdfef5 ]

ccid2_hc_tx_rto_expire() timer callback always restarts the timer
again and can run indefinitely (unless it is stopped outside), and after
commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"),
which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from
dccp_destroy_sock() to sk_destruct(), this started to happen quite often.
The timer prevents releasing the socket, as a result, sk_destruct() won't
be called.

Found with LTP/dccp_ipsec tests running on the bonding device,
which later couldn't be unloaded after the tests were completed:

  unregister_netdevice: waiting for bond0 to become free. Usage count = 148

Fixes: 2a91aa396739 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/dccp/ccids/ccid2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c
index 5e3a7302f774..7753681195c1 100644
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -140,6 +140,9 @@ static void ccid2_hc_tx_rto_expire(unsigned long data)
 
 	ccid2_pr_debug("RTO_EXPIRE\n");
 
+	if (sk->sk_state == DCCP_CLOSED)
+		goto out;
+
 	/* back-off timer */
 	hc->tx_rto <<= 1;
 	if (hc->tx_rto > DCCP_RTO_MAX)
-- 
2.14.3


From f14e18e1fffedd6fbc789fbeac7e6298987873f4 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben.hutchings@codethink.co.uk>
Date: Mon, 22 Jan 2018 20:06:42 +0000
Subject: [PATCH 02/23] ipv6: Fix getsockopt() for sockets with default
 IPV6_AUTOFLOWLABEL

[ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]

Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.

The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled.  Fix it to return the effective value, whether
that has been set at the socket or net namespace level.

Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ipv6.h       | 1 +
 net/ipv6/ip6_output.c    | 2 +-
 net/ipv6/ipv6_sockglue.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 615ce0abba9c..e64210c98c2b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -290,6 +290,7 @@ int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq,
 			   int flags);
 int ip6_flowlabel_init(void);
 void ip6_flowlabel_cleanup(void);
+bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np);
 
 static inline void fl6_sock_release(struct ip6_flowlabel *fl)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 388584b8ff31..6d000c0001fa 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -156,7 +156,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 			    !(IP6CB(skb)->flags & IP6SKB_REROUTED));
 }
 
-static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
+bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
 {
 	if (!np->autoflowlabel_set)
 		return ip6_default_np_autolabel(net);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 6e3871c7f8f7..bcea985dd76b 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1316,7 +1316,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_AUTOFLOWLABEL:
-		val = np->autoflowlabel;
+		val = ip6_autoflowlabel(sock_net(sk), np);
 		break;
 
 	default:
-- 
2.14.3


From 819ccc1028b0cfa3753b9d19f0819db6bf022d6e Mon Sep 17 00:00:00 2001
From: Mike Maloney <maloney@google.com>
Date: Wed, 10 Jan 2018 12:45:10 -0500
Subject: [PATCH 03/23] ipv6: fix udpv6 sendmsg crash caused by too small MTU

[ Upstream commit 749439bfac6e1a2932c582e2699f91d329658196 ]

The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers.  A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.

Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.

Found by syzkaller:
kernel BUG at ./include/linux/skbuff.h:2064!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
FS:  00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ip6_finish_skb include/net/ipv6.h:911 [inline]
 udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x352/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mike Maloney <maloney@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ip6_output.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6d000c0001fa..af98bbe7af0f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1260,14 +1260,16 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	v6_cork->tclass = ipc6->tclass;
 	if (rt->dst.flags & DST_XFRM_TUNNEL)
 		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(&rt->dst);
+		      READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
 	else
 		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(rt->dst.path);
+		      READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path);
 	if (np->frag_size < mtu) {
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
+	if (mtu < IPV6_MIN_MTU)
+		return -EINVAL;
 	cork->base.fragsize = mtu;
 	if (dst_allfrag(rt->dst.path))
 		cork->base.flags |= IPCORK_ALLFRAG;
-- 
2.14.3


From d143ed41c1ad0c35039f711af9dfa5b4f349d92b Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 11 Jan 2018 22:31:18 -0800
Subject: [PATCH 04/23] ipv6: ip6_make_skb() needs to clear cork.base.dst

[ Upstream commit 95ef498d977bf44ac094778fd448b98af158a3e6 ]

In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :

If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.

Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ip6_output.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index af98bbe7af0f..2e3db3619858 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1800,6 +1800,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 	cork.base.flags = 0;
 	cork.base.addr = 0;
 	cork.base.opt = NULL;
+	cork.base.dst = NULL;
 	v6_cork.opt = NULL;
 	err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6);
 	if (err) {
-- 
2.14.3


From 72f89df9a4038a17bf6318822df837a2b6ae960b Mon Sep 17 00:00:00 2001
From: Yuiko Oshino <yuiko.oshino@microchip.com>
Date: Mon, 15 Jan 2018 13:24:28 -0500
Subject: [PATCH 05/23] lan78xx: Fix failure in USB Full Speed

[ Upstream commit a5b1379afbfabf91e3a689e82ac619a7157336b3 ]

Fix initialize the uninitialized tx_qlen to an appropriate value when USB
Full Speed is used.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/usb/lan78xx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 9c257ffedb15..c53385a0052f 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2197,6 +2197,7 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 		buf = DEFAULT_BURST_CAP_SIZE / FS_USB_PKT_SIZE;
 		dev->rx_urb_size = DEFAULT_BURST_CAP_SIZE;
 		dev->rx_qlen = 4;
+		dev->tx_qlen = 4;
 	}
 
 	ret = lan78xx_write_reg(dev, BURST_CAP, buf);
-- 
2.14.3


From 0f1eba118f756f72ce5d4cedfb48979e67e0be03 Mon Sep 17 00:00:00 2001
From: Felix Fietkau <nbd@nbd.name>
Date: Fri, 19 Jan 2018 11:50:46 +0100
Subject: [PATCH 06/23] net: igmp: fix source address check for IGMPv3 reports

[ Upstream commit ad23b750933ea7bf962678972a286c78a8fa36aa ]

Commit "net: igmp: Use correct source address on IGMPv3 reports"
introduced a check to validate the source address of locally generated
IGMPv3 packets.
Instead of checking the local interface address directly, it uses
inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the
local subnet (or equal to the point-to-point address if used).

This breaks for point-to-point interfaces, so check against
ifa->ifa_local directly.

Cc: Kevin Cernekee <cernekee@chromium.org>
Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/igmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 7bff0c65046f..9c7a4cea1628 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -332,7 +332,7 @@ static __be32 igmpv3_get_srcaddr(struct net_device *dev,
 		return htonl(INADDR_ANY);
 
 	for_ifa(in_dev) {
-		if (inet_ifa_match(fl4->saddr, ifa))
+		if (fl4->saddr == ifa->ifa_local)
 			return fl4->saddr;
 	} endfor_ifa(in_dev);
 
-- 
2.14.3


From 1b46775e3896273d3c37e78cde145b18bbc46b7c Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 18 Jan 2018 19:59:19 -0800
Subject: [PATCH 07/23] net: qdisc_pkt_len_init() should be more robust

[ Upstream commit 7c68d1a6b4db9012790af7ac0f0fdc0d2083422a ]

Without proper validation of DODGY packets, we might very well
feed qdisc_pkt_len_init() with invalid GSO packets.

tcp_hdrlen() might access out-of-bound data, so let's use
skb_header_pointer() and proper checks.

Whole story is described in commit d0c081b49137 ("flow_dissector:
properly cap thoff field")

We have the goal of validating DODGY packets earlier in the stack,
so we might very well revert this fix in the future.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dev.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 09007a71c8dd..67b5d4d8acb1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3083,10 +3083,21 @@ static void qdisc_pkt_len_init(struct sk_buff *skb)
 		hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
 
 		/* + transport layer */
-		if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
-			hdr_len += tcp_hdrlen(skb);
-		else
-			hdr_len += sizeof(struct udphdr);
+		if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) {
+			const struct tcphdr *th;
+			struct tcphdr _tcphdr;
+
+			th = skb_header_pointer(skb, skb_transport_offset(skb),
+						sizeof(_tcphdr), &_tcphdr);
+			if (likely(th))
+				hdr_len += __tcp_hdrlen(th);
+		} else {
+			struct udphdr _udphdr;
+
+			if (skb_header_pointer(skb, skb_transport_offset(skb),
+					       sizeof(_udphdr), &_udphdr))
+				hdr_len += sizeof(struct udphdr);
+		}
 
 		if (shinfo->gso_type & SKB_GSO_DODGY)
 			gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
-- 
2.14.3


From 865753d765c23683eee75f5a7f06ed7b1ea101ee Mon Sep 17 00:00:00 2001
From: Dan Streetman <ddstreet@ieee.org>
Date: Thu, 18 Jan 2018 16:14:26 -0500
Subject: [PATCH 08/23] net: tcp: close sock if net namespace is exiting

[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open.  However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open.  In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence.  The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/net_namespace.h | 10 ++++++++++
 net/ipv4/tcp.c              |  3 +++
 net/ipv4/tcp_timer.c        | 15 +++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 0940598c002f..23102da24dd9 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -213,6 +213,11 @@ int net_eq(const struct net *net1, const struct net *net2)
 	return net1 == net2;
 }
 
+static inline int check_net(const struct net *net)
+{
+	return atomic_read(&net->count) != 0;
+}
+
 void net_drop_ns(void *);
 
 #else
@@ -237,6 +242,11 @@ int net_eq(const struct net *net1, const struct net *net2)
 	return 1;
 }
 
+static inline int check_net(const struct net *net)
+{
+	return 1;
+}
+
 #define net_drop_ns NULL
 #endif
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 05d2bde00864..7efa6b062049 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2215,6 +2215,9 @@ void tcp_close(struct sock *sk, long timeout)
 			tcp_send_active_reset(sk, GFP_ATOMIC);
 			__NET_INC_STATS(sock_net(sk),
 					LINUX_MIB_TCPABORTONMEMORY);
+		} else if (!check_net(sock_net(sk))) {
+			/* Not possible to send reset; just close */
+			tcp_set_state(sk, TCP_CLOSE);
 		}
 	}
 
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 74db43b47917..69523389f067 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -50,11 +50,19 @@ static void tcp_write_err(struct sock *sk)
  *  to prevent DoS attacks. It is called when a retransmission timeout
  *  or zero probe timeout occurs on orphaned socket.
  *
+ *  Also close if our net namespace is exiting; in that case there is no
+ *  hope of ever communicating again since all netns interfaces are already
+ *  down (or about to be down), and we need to release our dst references,
+ *  which have been moved to the netns loopback interface, so the namespace
+ *  can finish exiting.  This condition is only possible if we are a kernel
+ *  socket, as those do not hold references to the namespace.
+ *
  *  Criteria is still not confirmed experimentally and may change.
  *  We kill the socket, if:
  *  1. If number of orphaned sockets exceeds an administratively configured
  *     limit.
  *  2. If we have strong memory pressure.
+ *  3. If our net namespace is exiting.
  */
 static int tcp_out_of_resources(struct sock *sk, bool do_reset)
 {
@@ -83,6 +91,13 @@ static int tcp_out_of_resources(struct sock *sk, bool do_reset)
 		__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY);
 		return 1;
 	}
+
+	if (!check_net(sock_net(sk))) {
+		/* Not possible to send reset; just close */
+		tcp_done(sk);
+		return 1;
+	}
+
 	return 0;
 }
 
-- 
2.14.3


From e18f9929f1c592612ae2cb89e468a4590625202c Mon Sep 17 00:00:00 2001
From: Guillaume Nault <g.nault@alphalink.fr>
Date: Mon, 22 Jan 2018 18:06:37 +0100
Subject: [PATCH 09/23] pppoe: take ->needed_headroom of lower device into
 account on xmit

[ Upstream commit 02612bb05e51df8489db5e94d0cf8d1c81f87b0c ]

In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
was probably fine before the introduction of ->needed_headroom in
commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom").

But now, virtual devices typically advertise the size of their overhead
in dev->needed_headroom, so we must also take it into account in
skb_reserve().
Allocation size of skb is also updated to take dev->needed_tailroom
into account and replace the arbitrary 32 bytes with the real size of
a PPPoE header.

This issue was discovered by syzbot, who connected a pppoe socket to a
gre device which had dev->header_ops->create == ipgre_header and
dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
headroom, and dev_hard_header() crashed when ipgre_header() tried to
prepend its header to skb->data.

skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
4.15.0-rc7-next-20180115+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
FS:  00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  skb_under_panic net/core/skbuff.c:114 [inline]
  skb_push+0xce/0xf0 net/core/skbuff.c:1714
  ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
  dev_hard_header include/linux/netdevice.h:2723 [inline]
  pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:640
  sock_write_iter+0x31a/0x5d0 net/socket.c:909
  call_write_iter include/linux/fs.h:1775 [inline]
  do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
  do_iter_write+0x154/0x540 fs/read_write.c:932
  vfs_writev+0x18a/0x340 fs/read_write.c:977
  do_writev+0xfc/0x2a0 fs/read_write.c:1012
  SYSC_writev fs/read_write.c:1085 [inline]
  SyS_writev+0x27/0x30 fs/read_write.c:1082
  entry_SYSCALL_64_fastpath+0x29/0xa0

Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
interfaces, but reserving space for ->needed_headroom is a more
fundamental issue that needs to be addressed first.

Same problem exists for __pppoe_xmit(), which also needs to take
dev->needed_headroom into account in skb_cow_head().

Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")
Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ppp/pppoe.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 4ddae8118c85..dc36c2ec1d10 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -842,6 +842,7 @@ static int pppoe_sendmsg(struct socket *sock, struct msghdr *m,
 	struct pppoe_hdr *ph;
 	struct net_device *dev;
 	char *start;
+	int hlen;
 
 	lock_sock(sk);
 	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) {
@@ -860,16 +861,16 @@ static int pppoe_sendmsg(struct socket *sock, struct msghdr *m,
 	if (total_len > (dev->mtu + dev->hard_header_len))
 		goto end;
 
-
-	skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32,
-			   0, GFP_KERNEL);
+	hlen = LL_RESERVED_SPACE(dev);
+	skb = sock_wmalloc(sk, hlen + sizeof(*ph) + total_len +
+			   dev->needed_tailroom, 0, GFP_KERNEL);
 	if (!skb) {
 		error = -ENOMEM;
 		goto end;
 	}
 
 	/* Reserve space for headers. */
-	skb_reserve(skb, dev->hard_header_len);
+	skb_reserve(skb, hlen);
 	skb_reset_network_header(skb);
 
 	skb->dev = dev;
@@ -930,7 +931,7 @@ static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
 	/* Copy the data if there is no space for the header or if it's
 	 * read-only.
 	 */
-	if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len))
+	if (skb_cow_head(skb, LL_RESERVED_SPACE(dev) + sizeof(*ph)))
 		goto abort;
 
 	__skb_push(skb, sizeof(*ph));
-- 
2.14.3


From 5c930fdb9ca65bf5fe6528d40a27d4ece0b86424 Mon Sep 17 00:00:00 2001
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Fri, 26 Jan 2018 01:53:26 +0100
Subject: [PATCH 10/23] r8169: fix memory corruption on retrieval of hardware
 statistics.

[ Upstream commit a78e93661c5fd30b9e1dee464b2f62f966883ef7 ]

Hardware statistics retrieval hurts in tight invocation loops.

Avoid extraneous write and enforce strict ordering of writes targeted to
the tally counters dump area address registers.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Oliver Freyermuth <o.freyermuth@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/realtek/r8169.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 2c4350a1c629..298b74ebc1e9 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2222,19 +2222,14 @@ static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd)
 	void __iomem *ioaddr = tp->mmio_addr;
 	dma_addr_t paddr = tp->counters_phys_addr;
 	u32 cmd;
-	bool ret;
 
 	RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
+	RTL_R32(CounterAddrHigh);
 	cmd = (u64)paddr & DMA_BIT_MASK(32);
 	RTL_W32(CounterAddrLow, cmd);
 	RTL_W32(CounterAddrLow, cmd | counter_cmd);
 
-	ret = rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
-
-	RTL_W32(CounterAddrLow, 0);
-	RTL_W32(CounterAddrHigh, 0);
-
-	return ret;
+	return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
 }
 
 static bool rtl8169_reset_counters(struct net_device *dev)
-- 
2.14.3


From 8bd44f95cc210a13daea4c4f18465b8fb91511eb Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 15 Jan 2018 17:02:00 +0800
Subject: [PATCH 11/23] sctp: do not allow the v4 socket to bind a v4mapped v6
 address

[ Upstream commit c5006b8aa74599ce19104b31d322d2ea9ff887cc ]

The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.

The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.

This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.

Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/socket.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7181ce6c62bf..b9260d4029d8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -332,16 +332,14 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
 	if (len < sizeof (struct sockaddr))
 		return NULL;
 
+	if (!opt->pf->af_supported(addr->sa.sa_family, opt))
+		return NULL;
+
 	/* V4 mapped address are really of AF_INET family */
 	if (addr->sa.sa_family == AF_INET6 &&
-	    ipv6_addr_v4mapped(&addr->v6.sin6_addr)) {
-		if (!opt->pf->af_supported(AF_INET, opt))
-			return NULL;
-	} else {
-		/* Does this PF support this AF? */
-		if (!opt->pf->af_supported(addr->sa.sa_family, opt))
-			return NULL;
-	}
+	    ipv6_addr_v4mapped(&addr->v6.sin6_addr) &&
+	    !opt->pf->af_supported(AF_INET, opt))
+		return NULL;
 
 	/* If we get this far, af is valid. */
 	af = sctp_get_af_specific(addr->sa.sa_family);
-- 
2.14.3


From 8d97f6df2f93691a59f39d2bace21abc824142d3 Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 15 Jan 2018 17:01:36 +0800
Subject: [PATCH 12/23] sctp: return error if the asoc has been peeled off in
 sctp_wait_for_sndbuf

[ Upstream commit a0ff660058b88d12625a783ce9e5c1371c87951f ]

After commit cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.

However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.

This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.

Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/socket.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index b9260d4029d8..c472b8391dde 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -83,7 +83,7 @@
 static int sctp_writeable(struct sock *sk);
 static void sctp_wfree(struct sk_buff *skb);
 static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
-				size_t msg_len, struct sock **orig_sk);
+				size_t msg_len);
 static int sctp_wait_for_packet(struct sock *sk, int *err, long *timeo_p);
 static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p);
 static int sctp_wait_for_accept(struct sock *sk, long timeo);
@@ -1956,7 +1956,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
 	if (!sctp_wspace(asoc)) {
 		/* sk can be changed by peel off when waiting for buf. */
-		err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len, &sk);
+		err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
 		if (err) {
 			if (err == -ESRCH) {
 				/* asoc is already dead. */
@@ -7439,12 +7439,12 @@ void sctp_sock_rfree(struct sk_buff *skb)
 
 /* Helper function to wait for space in the sndbuf.  */
 static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
-				size_t msg_len, struct sock **orig_sk)
+				size_t msg_len)
 {
 	struct sock *sk = asoc->base.sk;
-	int err = 0;
 	long current_timeo = *timeo_p;
 	DEFINE_WAIT(wait);
+	int err = 0;
 
 	pr_debug("%s: asoc:%p, timeo:%ld, msg_len:%zu\n", __func__, asoc,
 		 *timeo_p, msg_len);
@@ -7473,17 +7473,13 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
 		release_sock(sk);
 		current_timeo = schedule_timeout(current_timeo);
 		lock_sock(sk);
-		if (sk != asoc->base.sk) {
-			release_sock(sk);
-			sk = asoc->base.sk;
-			lock_sock(sk);
-		}
+		if (sk != asoc->base.sk)
+			goto do_error;
 
 		*timeo_p = current_timeo;
 	}
 
 out:
-	*orig_sk = sk;
 	finish_wait(&asoc->wait, &wait);
 
 	/* Release the association's refcnt.  */
-- 
2.14.3


From 5288673bd899ea5da8be349b8f44609089cb65f4 Mon Sep 17 00:00:00 2001
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 10 Jan 2018 12:50:25 -0800
Subject: [PATCH 13/23] tipc: fix a memory leak in tipc_nl_node_get_link()

[ Upstream commit 59b36613e85fb16ebf9feaf914570879cd5c2a21 ]

When tipc_node_find_by_name() fails, the nlmsg is not
freed.

While on it, switch to a goto label to properly
free it.

Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/tipc/node.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index 27753325e06e..5b3e1ea37b6d 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1848,36 +1848,38 @@ int tipc_nl_node_get_link(struct sk_buff *skb, struct genl_info *info)
 
 	if (strcmp(name, tipc_bclink_name) == 0) {
 		err = tipc_nl_add_bc_link(net, &msg);
-		if (err) {
-			nlmsg_free(msg.skb);
-			return err;
-		}
+		if (err)
+			goto err_free;
 	} else {
 		int bearer_id;
 		struct tipc_node *node;
 		struct tipc_link *link;
 
 		node = tipc_node_find_by_name(net, name, &bearer_id);
-		if (!node)
-			return -EINVAL;
+		if (!node) {
+			err = -EINVAL;
+			goto err_free;
+		}
 
 		tipc_node_read_lock(node);
 		link = node->links[bearer_id].link;
 		if (!link) {
 			tipc_node_read_unlock(node);
-			nlmsg_free(msg.skb);
-			return -EINVAL;
+			err = -EINVAL;
+			goto err_free;
 		}
 
 		err = __tipc_nl_add_link(net, &msg, link, 0);
 		tipc_node_read_unlock(node);
-		if (err) {
-			nlmsg_free(msg.skb);
-			return err;
-		}
+		if (err)
+			goto err_free;
 	}
 
 	return genlmsg_reply(msg.skb, info);
+
+err_free:
+	nlmsg_free(msg.skb);
+	return err;
 }
 
 int tipc_nl_node_reset_link_stats(struct sk_buff *skb, struct genl_info *info)
-- 
2.14.3


From 8f0d81fabb5ebccabf36c9ec7a4afdd4f7b1c194 Mon Sep 17 00:00:00 2001
From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 22 Jan 2018 16:06:37 -0500
Subject: [PATCH 14/23] vmxnet3: repair memory leak

[ Upstream commit 848b159835ddef99cc4193083f7e786c3992f580 ]

with the introduction of commit
b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info
is improperly handled.  While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when  the mtu of a vmxnet3 interface is changed
frequently.

Fix is pretty straight forward, just move the NULL set to after the
free.

Tested by myself with successful results

Applies to net, and should likely be queued for stable, please

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-By: boyang@redhat.com
CC: boyang@redhat.com
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: David S. Miller <davem@davemloft.net>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index ef83ae3b0a44..4afba17e2403 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1616,7 +1616,6 @@ static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
 					  rq->rx_ring[i].basePA);
 			rq->rx_ring[i].base = NULL;
 		}
-		rq->buf_info[i] = NULL;
 	}
 
 	if (rq->data_ring.base) {
@@ -1638,6 +1637,7 @@ static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
 			(rq->rx_ring[0].size + rq->rx_ring[1].size);
 		dma_free_coherent(&adapter->pdev->dev, sz, rq->buf_info[0],
 				  rq->buf_info_pa);
+		rq->buf_info[0] = rq->buf_info[1] = NULL;
 	}
 }
 
-- 
2.14.3


From c9c52b4fd0552d17af45dd0456507171aca57775 Mon Sep 17 00:00:00 2001
From: Jim Westfall <jwestfall@surrealistic.net>
Date: Sun, 14 Jan 2018 04:18:50 -0800
Subject: [PATCH 15/23] net: Allow neigh contructor functions ability to modify
 the primary_key

[ Upstream commit 096b9854c04df86f03b38a97d40b6506e5730919 ]

Use n->primary_key instead of pkey to account for the possibility that a neigh
constructor function may have modified the primary_key value.

Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/neighbour.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f45f6198851f..7b315663f840 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -496,7 +496,7 @@ struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey,
 	if (atomic_read(&tbl->entries) > (1 << nht->hash_shift))
 		nht = neigh_hash_grow(tbl, nht->hash_shift + 1);
 
-	hash_val = tbl->hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
+	hash_val = tbl->hash(n->primary_key, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
 
 	if (n->parms->dead) {
 		rc = ERR_PTR(-EINVAL);
@@ -508,7 +508,7 @@ struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey,
 	     n1 != NULL;
 	     n1 = rcu_dereference_protected(n1->next,
 			lockdep_is_held(&tbl->lock))) {
-		if (dev == n1->dev && !memcmp(n1->primary_key, pkey, key_len)) {
+		if (dev == n1->dev && !memcmp(n1->primary_key, n->primary_key, key_len)) {
 			if (want_ref)
 				neigh_hold(n1);
 			rc = n1;
-- 
2.14.3


From 94e7e43592316ae2359c42781a4128893766a471 Mon Sep 17 00:00:00 2001
From: Jim Westfall <jwestfall@surrealistic.net>
Date: Sun, 14 Jan 2018 04:18:51 -0800
Subject: [PATCH 16/23] ipv4: Make neigh lookup keys for
 loopback/point-to-point devices be INADDR_ANY

[ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]

Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.

This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.

Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/arp.h | 3 +++
 net/ipv4/arp.c    | 7 ++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 5e0f891d476c..1b3f86981757 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -19,6 +19,9 @@ static inline u32 arp_hashfn(const void *pkey, const struct net_device *dev, u32
 
 static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key)
 {
+	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
+		key = INADDR_ANY;
+
 	return ___neigh_lookup_noref(&arp_tbl, neigh_key_eq32, arp_hashfn, &key, dev);
 }
 
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 51b27ae09fbd..e60517eb1c3a 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -223,11 +223,16 @@ static bool arp_key_eq(const struct neighbour *neigh, const void *pkey)
 
 static int arp_constructor(struct neighbour *neigh)
 {
-	__be32 addr = *(__be32 *)neigh->primary_key;
+	__be32 addr;
 	struct net_device *dev = neigh->dev;
 	struct in_device *in_dev;
 	struct neigh_parms *parms;
+	u32 inaddr_any = INADDR_ANY;
 
+	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
+		memcpy(neigh->primary_key, &inaddr_any, arp_tbl.key_len);
+
+	addr = *(__be32 *)neigh->primary_key;
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
 	if (!in_dev) {
-- 
2.14.3


From 26a58328acaa46a21a4d0ff1d5bcf5a780313382 Mon Sep 17 00:00:00 2001
From: Guillaume Nault <g.nault@alphalink.fr>
Date: Wed, 10 Jan 2018 16:24:45 +0100
Subject: [PATCH 17/23] ppp: unlock all_ppp_mutex before registering device

[ Upstream commit 0171c41835591e9aa2e384b703ef9a6ae367c610 ]

ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
needs to lock pn->all_ppp_mutex. Therefore we mustn't call
register_netdevice() with pn->all_ppp_mutex already locked, or we'd
deadlock in case register_netdevice() fails and calls .ndo_uninit().

Fortunately, we can unlock pn->all_ppp_mutex before calling
register_netdevice(). This lock protects pn->units_idr, which isn't
used in the device registration process.

However, keeping pn->all_ppp_mutex locked during device registration
did ensure that no device in transient state would be published in
pn->units_idr. In practice, unlocking it before calling
register_netdevice() doesn't change this property: ppp_unit_register()
is called with 'ppp_mutex' locked and all searches done in
pn->units_idr hold this lock too.

Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ppp/ppp_generic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index b883af93929c..fc4c2ccc3d22 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1002,17 +1002,18 @@ static int ppp_unit_register(struct ppp *ppp, int unit, bool ifname_is_set)
 	if (!ifname_is_set)
 		snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
 
+	mutex_unlock(&pn->all_ppp_mutex);
+
 	ret = register_netdevice(ppp->dev);
 	if (ret < 0)
 		goto err_unit;
 
 	atomic_inc(&ppp_unit_count);
 
-	mutex_unlock(&pn->all_ppp_mutex);
-
 	return 0;
 
 err_unit:
+	mutex_lock(&pn->all_ppp_mutex);
 	unit_put(&pn->units_idr, ppp->file.index);
 err:
 	mutex_unlock(&pn->all_ppp_mutex);
-- 
2.14.3


From b54ba533e427eab3116fcd7db6d03338fcb8a5e2 Mon Sep 17 00:00:00 2001
From: Ivan Vecera <cera@cera.cz>
Date: Fri, 19 Jan 2018 20:23:50 +0100
Subject: [PATCH 18/23] be2net: restore properly promisc mode after queues
 reconfiguration

[ Upstream commit 52acf06451930eb4cefabd5ecea56e2d46c32f76 ]

The commit 622190669403 ("be2net: Request RSS capability of Rx interface
depending on number of Rx rings") modified be_update_queues() so the
IFACE (HW representation of the netdevice) is destroyed and then
re-created. This causes a regression because potential promiscuous mode
is not restored properly during be_open() because the driver thinks
that the HW has promiscuous mode already enabled.

Note that Lancer is not affected by this bug because RX-filter flags are
disabled during be_close() for this chipset.

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>

Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 1644896568c4..b2eeecb26939 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4733,6 +4733,15 @@ int be_update_queues(struct be_adapter *adapter)
 
 	be_schedule_worker(adapter);
 
+	/*
+	 * The IF was destroyed and re-created. We need to clear
+	 * all promiscuous flags valid for the destroyed IF.
+	 * Without this promisc mode is not restored during
+	 * be_open() because the driver thinks that it is
+	 * already enabled in HW.
+	 */
+	adapter->if_flags &= ~BE_IF_FLAGS_ALL_PROMISCUOUS;
+
 	if (netif_running(netdev))
 		status = be_open(netdev);
 
-- 
2.14.3


From b7da2fffa2a33fd1549f6b23f059e8f7df9d09d8 Mon Sep 17 00:00:00 2001
From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Thu, 18 Jan 2018 20:51:12 +0300
Subject: [PATCH 19/23] ip6_gre: init dev->mtu and dev->hard_header_len
 correctly

[ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]

Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
call common GRE functions") moved dev->mtu initialization
from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
result, the previously set values, before ndo_init(), are
reset in the following cases:

* rtnl_create_link() can update dev->mtu from IFLA_MTU
  parameter.

* ip6gre_tnl_link_config() is invoked before ndo_init() in
  netlink and ioctl setup, so ndo_init() can reset MTU
  adjustments with the lower device MTU as well, dev->mtu
  and dev->hard_header_len.

  Not applicable for ip6gretap because it has one more call
  to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().

Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
parameter if a user sets it manually on a device creation,
and fix the second one by moving ip6gre_tnl_link_config()
call after register_netdevice().

Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ip6_gre.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index c46066c5dc27..db2613b4a049 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -337,11 +337,12 @@ static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net,
 
 	nt->dev = dev;
 	nt->net = dev_net(dev);
-	ip6gre_tnl_link_config(nt, 1);
 
 	if (register_netdevice(dev) < 0)
 		goto failed_free;
 
+	ip6gre_tnl_link_config(nt, 1);
+
 	/* Can use a lockless transmit, unless we generate output sequences */
 	if (!(nt->parms.o_flags & TUNNEL_SEQ))
 		dev->features |= NETIF_F_LLTX;
@@ -1263,7 +1264,6 @@ static void ip6gre_netlink_parms(struct nlattr *data[],
 
 static int ip6gre_tap_init(struct net_device *dev)
 {
-	struct ip6_tnl *tunnel;
 	int ret;
 
 	ret = ip6gre_tunnel_init_common(dev);
@@ -1272,10 +1272,6 @@ static int ip6gre_tap_init(struct net_device *dev)
 
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 
-	tunnel = netdev_priv(dev);
-
-	ip6gre_tnl_link_config(tunnel, 1);
-
 	return 0;
 }
 
@@ -1370,7 +1366,6 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev,
 
 	nt->dev = dev;
 	nt->net = dev_net(dev);
-	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
 
 	dev->features		|= GRE6_FEATURES;
 	dev->hw_features	|= GRE6_FEATURES;
@@ -1396,6 +1391,11 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev,
 	if (err)
 		goto out;
 
+	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
+
+	if (tb[IFLA_MTU])
+		ip6_tnl_change_mtu(dev, nla_get_u32(tb[IFLA_MTU]));
+
 	dev_hold(dev);
 	ip6gre_tunnel_link(ign, nt);
 
-- 
2.14.3


From 8da7bfc438b30949f9622866523bff373bff9112 Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Fri, 19 Jan 2018 09:29:18 -0500
Subject: [PATCH 20/23] gso: validate gso_type in GSO handlers

[ Upstream commit 121d57af308d0cf943f08f4738d24d3966c38cd9 ]

Validate gso_type during segmentation as SKB_GSO_DODGY sources
may pass packets where the gso_type does not match the contents.

Syzkaller was able to enter the SCTP gso handler with a packet of
gso_type SKB_GSO_TCPV4.

On entry of transport layer gso handlers, verify that the gso_type
matches the transport protocol.

Fixes: 90017accff61 ("sctp: Add GSO support")
Link: http://lkml.kernel.org/r/<001a1137452496ffc305617e5fe0@google.com>
Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/tcp_offload.c   | 3 +++
 net/ipv4/udp_offload.c   | 3 +++
 net/ipv6/tcpv6_offload.c | 3 +++
 net/ipv6/udp_offload.c   | 3 +++
 net/sctp/offload.c       | 3 +++
 5 files changed, 15 insertions(+)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index bc68da38ea86..366b1becff9d 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -32,6 +32,9 @@ static void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq,
 static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb,
 					netdev_features_t features)
 {
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4))
+		return ERR_PTR(-EINVAL);
+
 	if (!pskb_may_pull(skb, sizeof(struct tcphdr)))
 		return ERR_PTR(-EINVAL);
 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 6401574cd638..f4f616eaaeb8 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -205,6 +205,9 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 		goto out;
 	}
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP))
+		goto out;
+
 	if (!pskb_may_pull(skb, sizeof(struct udphdr)))
 		goto out;
 
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index d883c9204c01..278e49cd67d4 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -46,6 +46,9 @@ static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb,
 {
 	struct tcphdr *th;
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6))
+		return ERR_PTR(-EINVAL);
+
 	if (!pskb_may_pull(skb, sizeof(*th)))
 		return ERR_PTR(-EINVAL);
 
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index e7d378c032cb..2bd2087bd105 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -55,6 +55,9 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 		const struct ipv6hdr *ipv6h;
 		struct udphdr *uh;
 
+		if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP))
+			goto out;
+
 		if (!pskb_may_pull(skb, sizeof(struct udphdr)))
 			goto out;
 
diff --git a/net/sctp/offload.c b/net/sctp/offload.c
index 4f5a2b580aa5..6300f28c9588 100644
--- a/net/sctp/offload.c
+++ b/net/sctp/offload.c
@@ -44,6 +44,9 @@ static struct sk_buff *sctp_gso_segment(struct sk_buff *skb,
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct sctphdr *sh;
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP))
+		goto out;
+
 	sh = sctp_hdr(skb);
 	if (!pskb_may_pull(skb, sizeof(*sh)))
 		goto out;
-- 
2.14.3


From 82c74f1d8c542de078e4db9c49cac165d39a98fe Mon Sep 17 00:00:00 2001
From: Yuval Mintz <yuvalm@mellanox.com>
Date: Wed, 24 Jan 2018 10:02:09 +0100
Subject: [PATCH 21/23] mlxsw: spectrum_router: Don't log an error on missing
 neighbor

[ Upstream commit 1ecdaea02ca6bfacf2ecda500dc1af51e9780c42 ]

Driver periodically samples all neighbors configured in device
in order to update the kernel regarding their state. When finding
an entry configured in HW that doesn't show in neigh_lookup()
driver logs an error message.
This introduces a race when removing multiple neighbors -
it's possible that a given entry would still be configured in HW
as its removal is still being processed but is already removed
from the kernel's neighbor tables.

Simply remove the error message and gracefully accept such events.

Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Fixes: 60f040ca11b9 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 8aa91ddff287..16556011d571 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -765,11 +765,8 @@ static void mlxsw_sp_router_neigh_ent_ipv4_process(struct mlxsw_sp *mlxsw_sp,
 	dipn = htonl(dip);
 	dev = mlxsw_sp->rifs[rif]->dev;
 	n = neigh_lookup(&arp_tbl, &dipn, dev);
-	if (!n) {
-		netdev_err(dev, "Failed to find matching neighbour for IP=%pI4h\n",
-			   &dip);
+	if (!n)
 		return;
-	}
 
 	netdev_dbg(dev, "Updating neighbour with IP=%pI4h\n", &dip);
 	neigh_event_send(n, NULL);
-- 
2.14.3


From e09d5c3150f66275e6472d4e0268c3555961fd41 Mon Sep 17 00:00:00 2001
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 15 Jan 2018 11:37:29 -0800
Subject: [PATCH 22/23] tun: fix a memory leak for tfile->tx_array

[ Upstream commit 4df0bfc79904b7169dc77dcce44598b1545721f9 ]

tfile->tun could be detached before we close the tun fd,
via tun_detach_all(), so it should not be used to check for
tfile->tx_array.

As Jason suggested, we probably have to clean it up
unconditionally both in __tun_deatch() and tun_detach_all(),
but this requires to check if it is initialized or not.
Currently skb_array_cleanup() doesn't have such a check,
so I check it in the caller and introduce a helper function,
it is a bit ugly but we can always improve it in net-next.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/tun.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 518cbfbc8b65..eb6dc28e5e52 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -525,6 +525,14 @@ static void tun_queue_purge(struct tun_file *tfile)
 	skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
+static void tun_cleanup_tx_array(struct tun_file *tfile)
+{
+	if (tfile->tx_array.ring.queue) {
+		skb_array_cleanup(&tfile->tx_array);
+		memset(&tfile->tx_array, 0, sizeof(tfile->tx_array));
+	}
+}
+
 static void __tun_detach(struct tun_file *tfile, bool clean)
 {
 	struct tun_file *ntfile;
@@ -566,8 +574,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 			    tun->dev->reg_state == NETREG_REGISTERED)
 				unregister_netdevice(tun->dev);
 		}
-		if (tun)
-			skb_array_cleanup(&tfile->tx_array);
+		tun_cleanup_tx_array(tfile);
 		sock_put(&tfile->sk);
 	}
 }
@@ -606,11 +613,13 @@ static void tun_detach_all(struct net_device *dev)
 		/* Drop read queue */
 		tun_queue_purge(tfile);
 		sock_put(&tfile->sk);
+		tun_cleanup_tx_array(tfile);
 	}
 	list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
 		tun_enable_queue(tfile);
 		tun_queue_purge(tfile);
 		sock_put(&tfile->sk);
+		tun_cleanup_tx_array(tfile);
 	}
 	BUG_ON(tun->numdisabled != 0);
 
@@ -2363,6 +2372,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 
 	sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
 
+	memset(&tfile->tx_array, 0, sizeof(tfile->tx_array));
+
 	return 0;
 }
 
-- 
2.14.3


From 131234b51ec6915a807dce43c65ec79bbc05aa33 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 17 Jan 2018 14:21:13 -0800
Subject: [PATCH 23/23] flow_dissector: properly cap thoff field

[ Upstream commit d0c081b49137cd3200f2023c0875723be66e7ce5 ]

syzbot reported yet another crash [1] that is caused by
insufficient validation of DODGY packets.

Two bugs are happening here to trigger the crash.

1) Flow dissection leaves with incorrect thoff field.

2) skb_probe_transport_header() sets transport header to this invalid
thoff, even if pointing after skb valid data.

3) qdisc_pkt_len_init() reads out-of-bound data because it
trusts tcp_hdrlen(skb)

Possible fixes :

- Full flow dissector validation before injecting bad DODGY packets in
the stack.
 This approach was attempted here : https://patchwork.ozlabs.org/patch/
861874/

- Have more robust functions in the core.
  This might be needed anyway for stable versions.

This patch fixes the flow dissection issue.

[1]
CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:355 [inline]
 kasan_report+0x23b/0x360 mm/kasan/report.c:413
 __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
 __tcp_hdrlen include/linux/tcp.h:35 [inline]
 tcp_hdrlen include/linux/tcp.h:40 [inline]
 qdisc_pkt_len_init net/core/dev.c:3160 [inline]
 __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
 packet_snd net/packet/af_packet.c:2943 [inline]
 packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
 sock_sendmsg_nosec net/socket.c:628 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:638
 sock_write_iter+0x31a/0x5d0 net/socket.c:907
 call_write_iter include/linux/fs.h:1776 [inline]
 new_sync_write fs/read_write.c:469 [inline]
 __vfs_write+0x684/0x970 fs/read_write.c:482
 vfs_write+0x189/0x510 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:581
 entry_SYSCALL_64_fastpath+0x1f/0x96

Fixes: 34fad54c2537 ("net: __skb_flow_dissect() must cap its return value")
Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/flow_dissector.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 32e4e0158846..862d63ec56e4 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -550,8 +550,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 out_good:
 	ret = true;
 
-	key_control->thoff = (u16)nhoff;
 out:
+	key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen);
 	key_basic->n_proto = proto;
 	key_basic->ip_proto = ip_proto;
 
@@ -559,7 +559,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 
 out_bad:
 	ret = false;
-	key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen);
 	goto out;
 }
 EXPORT_SYMBOL(__skb_flow_dissect);
-- 
2.14.3


[-- Attachment #3: net_414.mbox --]
[-- Type: Application/Octet-Stream, Size: 88875 bytes --]

From 88ae940fd4e8e0506cc6a11cd2f34dfffbf83110 Mon Sep 17 00:00:00 2001
From: Jim Westfall <jwestfall@surrealistic.net>
Date: Sun, 14 Jan 2018 04:18:50 -0800
Subject: [PATCH 01/37] net: Allow neigh contructor functions ability to modify
 the primary_key

[ Upstream commit 096b9854c04df86f03b38a97d40b6506e5730919 ]

Use n->primary_key instead of pkey to account for the possibility that a neigh
constructor function may have modified the primary_key value.

Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/neighbour.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 16a1a4c4eb57..741ae2554190 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -532,7 +532,7 @@ struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey,
 	if (atomic_read(&tbl->entries) > (1 << nht->hash_shift))
 		nht = neigh_hash_grow(tbl, nht->hash_shift + 1);
 
-	hash_val = tbl->hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
+	hash_val = tbl->hash(n->primary_key, dev, nht->hash_rnd) >> (32 - nht->hash_shift);
 
 	if (n->parms->dead) {
 		rc = ERR_PTR(-EINVAL);
@@ -544,7 +544,7 @@ struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey,
 	     n1 != NULL;
 	     n1 = rcu_dereference_protected(n1->next,
 			lockdep_is_held(&tbl->lock))) {
-		if (dev == n1->dev && !memcmp(n1->primary_key, pkey, key_len)) {
+		if (dev == n1->dev && !memcmp(n1->primary_key, n->primary_key, key_len)) {
 			if (want_ref)
 				neigh_hold(n1);
 			rc = n1;
-- 
2.14.3


From 7cb838dc638ad4961c3ac734442fb035e1152e69 Mon Sep 17 00:00:00 2001
From: Jim Westfall <jwestfall@surrealistic.net>
Date: Sun, 14 Jan 2018 04:18:51 -0800
Subject: [PATCH 02/37] ipv4: Make neigh lookup keys for
 loopback/point-to-point devices be INADDR_ANY

[ Upstream commit cd9ff4de0107c65d69d02253bb25d6db93c3dbc1 ]

Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.

This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.

Signed-off-by: Jim Westfall <jwestfall@surrealistic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/arp.h | 3 +++
 net/ipv4/arp.c    | 7 ++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index dc8cd47f883b..977aabfcdc03 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -20,6 +20,9 @@ static inline u32 arp_hashfn(const void *pkey, const struct net_device *dev, u32
 
 static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key)
 {
+	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
+		key = INADDR_ANY;
+
 	return ___neigh_lookup_noref(&arp_tbl, neigh_key_eq32, arp_hashfn, &key, dev);
 }
 
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 7c45b8896709..a1d1f50e0e19 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -223,11 +223,16 @@ static bool arp_key_eq(const struct neighbour *neigh, const void *pkey)
 
 static int arp_constructor(struct neighbour *neigh)
 {
-	__be32 addr = *(__be32 *)neigh->primary_key;
+	__be32 addr;
 	struct net_device *dev = neigh->dev;
 	struct in_device *in_dev;
 	struct neigh_parms *parms;
+	u32 inaddr_any = INADDR_ANY;
 
+	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
+		memcpy(neigh->primary_key, &inaddr_any, arp_tbl.key_len);
+
+	addr = *(__be32 *)neigh->primary_key;
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
 	if (!in_dev) {
-- 
2.14.3


From b8030fc22ef6423d454ee39e8176eb8c00748027 Mon Sep 17 00:00:00 2001
From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Fri, 26 Jan 2018 15:14:16 +0300
Subject: [PATCH 03/37] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in
 closed state

[ Upstream commit dd5684ecae3bd8e44b644f50e2c12c7e57fdfef5 ]

ccid2_hc_tx_rto_expire() timer callback always restarts the timer
again and can run indefinitely (unless it is stopped outside), and after
commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"),
which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from
dccp_destroy_sock() to sk_destruct(), this started to happen quite often.
The timer prevents releasing the socket, as a result, sk_destruct() won't
be called.

Found with LTP/dccp_ipsec tests running on the bonding device,
which later couldn't be unloaded after the tests were completed:

  unregister_netdevice: waiting for bond0 to become free. Usage count = 148

Fixes: 2a91aa396739 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/dccp/ccids/ccid2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c
index e1295d5f2c56..97791b0b1b51 100644
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -140,6 +140,9 @@ static void ccid2_hc_tx_rto_expire(unsigned long data)
 
 	ccid2_pr_debug("RTO_EXPIRE\n");
 
+	if (sk->sk_state == DCCP_CLOSED)
+		goto out;
+
 	/* back-off timer */
 	hc->tx_rto <<= 1;
 	if (hc->tx_rto > DCCP_RTO_MAX)
-- 
2.14.3


From 2af2e2923c874086cbabf4b29b588defdfaef886 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <ben.hutchings@codethink.co.uk>
Date: Mon, 22 Jan 2018 20:06:42 +0000
Subject: [PATCH 04/37] ipv6: Fix getsockopt() for sockets with default
 IPV6_AUTOFLOWLABEL

[ Upstream commit e9191ffb65d8e159680ce0ad2224e1acbde6985c ]

Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.

The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled.  Fix it to return the effective value, whether
that has been set at the socket or net namespace level.

Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ipv6.h       | 1 +
 net/ipv6/ip6_output.c    | 2 +-
 net/ipv6/ipv6_sockglue.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 35e9dd2d18ba..9596aa93d6ef 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -291,6 +291,7 @@ int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq,
 			   int flags);
 int ip6_flowlabel_init(void);
 void ip6_flowlabel_cleanup(void);
+bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np);
 
 static inline void fl6_sock_release(struct ip6_flowlabel *fl)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 688ba5f7516b..08446e0ca411 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -166,7 +166,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 			    !(IP6CB(skb)->flags & IP6SKB_REROUTED));
 }
 
-static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
+bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np)
 {
 	if (!np->autoflowlabel_set)
 		return ip6_default_np_autolabel(net);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 90dbfa78a390..3b251760cb8c 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1324,7 +1324,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_AUTOFLOWLABEL:
-		val = np->autoflowlabel;
+		val = ip6_autoflowlabel(sock_net(sk), np);
 		break;
 
 	case IPV6_RECVFRAGSIZE:
-- 
2.14.3


From 2384e14a733e021decb30acf9c2e74ea28b76f14 Mon Sep 17 00:00:00 2001
From: Mike Maloney <maloney@google.com>
Date: Wed, 10 Jan 2018 12:45:10 -0500
Subject: [PATCH 05/37] ipv6: fix udpv6 sendmsg crash caused by too small MTU

[ Upstream commit 749439bfac6e1a2932c582e2699f91d329658196 ]

The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers.  A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.

Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.

Found by syzkaller:
kernel BUG at ./include/linux/skbuff.h:2064!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
FS:  00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ip6_finish_skb include/net/ipv6.h:911 [inline]
 udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x352/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mike Maloney <maloney@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ip6_output.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 08446e0ca411..7ef142706d14 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1206,14 +1206,16 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	v6_cork->tclass = ipc6->tclass;
 	if (rt->dst.flags & DST_XFRM_TUNNEL)
 		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(&rt->dst);
+		      READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
 	else
 		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      rt->dst.dev->mtu : dst_mtu(rt->dst.path);
+		      READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path);
 	if (np->frag_size < mtu) {
 		if (np->frag_size)
 			mtu = np->frag_size;
 	}
+	if (mtu < IPV6_MIN_MTU)
+		return -EINVAL;
 	cork->base.fragsize = mtu;
 	if (dst_allfrag(rt->dst.path))
 		cork->base.flags |= IPCORK_ALLFRAG;
-- 
2.14.3


From 4056b922913c33150acfac82cde7740d58771d7b Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 11 Jan 2018 22:31:18 -0800
Subject: [PATCH 06/37] ipv6: ip6_make_skb() needs to clear cork.base.dst

[ Upstream commit 95ef498d977bf44ac094778fd448b98af158a3e6 ]

In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :

If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.

Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ip6_output.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7ef142706d14..3763dc01e374 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1735,6 +1735,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 	cork.base.flags = 0;
 	cork.base.addr = 0;
 	cork.base.opt = NULL;
+	cork.base.dst = NULL;
 	v6_cork.opt = NULL;
 	err = ip6_setup_cork(sk, &cork, &v6_cork, ipc6, rt, fl6);
 	if (err) {
-- 
2.14.3


From aefece9d1ca251c27d2b9898569449d2e65f7eb5 Mon Sep 17 00:00:00 2001
From: Yuiko Oshino <yuiko.oshino@microchip.com>
Date: Mon, 15 Jan 2018 13:24:28 -0500
Subject: [PATCH 07/37] lan78xx: Fix failure in USB Full Speed

[ Upstream commit a5b1379afbfabf91e3a689e82ac619a7157336b3 ]

Fix initialize the uninitialized tx_qlen to an appropriate value when USB
Full Speed is used.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/usb/lan78xx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 0161f77641fa..a8dd1c7a08cb 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2396,6 +2396,7 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 		buf = DEFAULT_BURST_CAP_SIZE / FS_USB_PKT_SIZE;
 		dev->rx_urb_size = DEFAULT_BURST_CAP_SIZE;
 		dev->rx_qlen = 4;
+		dev->tx_qlen = 4;
 	}
 
 	ret = lan78xx_write_reg(dev, BURST_CAP, buf);
-- 
2.14.3


From ee391b7768a93531d55edc21979dc8f593ef1a04 Mon Sep 17 00:00:00 2001
From: Felix Fietkau <nbd@nbd.name>
Date: Fri, 19 Jan 2018 11:50:46 +0100
Subject: [PATCH 08/37] net: igmp: fix source address check for IGMPv3 reports

[ Upstream commit ad23b750933ea7bf962678972a286c78a8fa36aa ]

Commit "net: igmp: Use correct source address on IGMPv3 reports"
introduced a check to validate the source address of locally generated
IGMPv3 packets.
Instead of checking the local interface address directly, it uses
inet_ifa_match(fl4->saddr, ifa), which checks if the address is on the
local subnet (or equal to the point-to-point address if used).

This breaks for point-to-point interfaces, so check against
ifa->ifa_local directly.

Cc: Kevin Cernekee <cernekee@chromium.org>
Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/igmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index c621266e0306..013fed55b610 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -332,7 +332,7 @@ static __be32 igmpv3_get_srcaddr(struct net_device *dev,
 		return htonl(INADDR_ANY);
 
 	for_ifa(in_dev) {
-		if (inet_ifa_match(fl4->saddr, ifa))
+		if (fl4->saddr == ifa->ifa_local)
 			return fl4->saddr;
 	} endfor_ifa(in_dev);
 
-- 
2.14.3


From b1d813d9001be9c2efdaa1289e1ff459fb189af9 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 18 Jan 2018 19:59:19 -0800
Subject: [PATCH 09/37] net: qdisc_pkt_len_init() should be more robust

[ Upstream commit 7c68d1a6b4db9012790af7ac0f0fdc0d2083422a ]

Without proper validation of DODGY packets, we might very well
feed qdisc_pkt_len_init() with invalid GSO packets.

tcp_hdrlen() might access out-of-bound data, so let's use
skb_header_pointer() and proper checks.

Whole story is described in commit d0c081b49137 ("flow_dissector:
properly cap thoff field")

We have the goal of validating DODGY packets earlier in the stack,
so we might very well revert this fix in the future.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Reported-by: syzbot+9da69ebac7dddd804552@syzkaller.appspotmail.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dev.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 27357fc1730b..ffee085f0357 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3128,10 +3128,21 @@ static void qdisc_pkt_len_init(struct sk_buff *skb)
 		hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
 
 		/* + transport layer */
-		if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
-			hdr_len += tcp_hdrlen(skb);
-		else
-			hdr_len += sizeof(struct udphdr);
+		if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) {
+			const struct tcphdr *th;
+			struct tcphdr _tcphdr;
+
+			th = skb_header_pointer(skb, skb_transport_offset(skb),
+						sizeof(_tcphdr), &_tcphdr);
+			if (likely(th))
+				hdr_len += __tcp_hdrlen(th);
+		} else {
+			struct udphdr _udphdr;
+
+			if (skb_header_pointer(skb, skb_transport_offset(skb),
+					       sizeof(_udphdr), &_udphdr))
+				hdr_len += sizeof(struct udphdr);
+		}
 
 		if (shinfo->gso_type & SKB_GSO_DODGY)
 			gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
-- 
2.14.3


From dc3ef99b1986664ee77e2c470a938d2061810693 Mon Sep 17 00:00:00 2001
From: Dan Streetman <ddstreet@ieee.org>
Date: Thu, 18 Jan 2018 16:14:26 -0500
Subject: [PATCH 10/37] net: tcp: close sock if net namespace is exiting

[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]

When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.

For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open.  However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open.  In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence.  The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:

unregister_netdevice: waiting for lo to become free. Usage count = 1

These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.

After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/net_namespace.h | 10 ++++++++++
 net/ipv4/tcp.c              |  3 +++
 net/ipv4/tcp_timer.c        | 15 +++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 10f99dafd5ac..049008493faf 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -223,6 +223,11 @@ int net_eq(const struct net *net1, const struct net *net2)
 	return net1 == net2;
 }
 
+static inline int check_net(const struct net *net)
+{
+	return atomic_read(&net->count) != 0;
+}
+
 void net_drop_ns(void *);
 
 #else
@@ -247,6 +252,11 @@ int net_eq(const struct net *net1, const struct net *net2)
 	return 1;
 }
 
+static inline int check_net(const struct net *net)
+{
+	return 1;
+}
+
 #define net_drop_ns NULL
 #endif
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a0c72b09cefc..2a65d806b562 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2273,6 +2273,9 @@ void tcp_close(struct sock *sk, long timeout)
 			tcp_send_active_reset(sk, GFP_ATOMIC);
 			__NET_INC_STATS(sock_net(sk),
 					LINUX_MIB_TCPABORTONMEMORY);
+		} else if (!check_net(sock_net(sk))) {
+			/* Not possible to send reset; just close */
+			tcp_set_state(sk, TCP_CLOSE);
 		}
 	}
 
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index e9af1879cd53..14ac7df95380 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -50,11 +50,19 @@ static void tcp_write_err(struct sock *sk)
  *  to prevent DoS attacks. It is called when a retransmission timeout
  *  or zero probe timeout occurs on orphaned socket.
  *
+ *  Also close if our net namespace is exiting; in that case there is no
+ *  hope of ever communicating again since all netns interfaces are already
+ *  down (or about to be down), and we need to release our dst references,
+ *  which have been moved to the netns loopback interface, so the namespace
+ *  can finish exiting.  This condition is only possible if we are a kernel
+ *  socket, as those do not hold references to the namespace.
+ *
  *  Criteria is still not confirmed experimentally and may change.
  *  We kill the socket, if:
  *  1. If number of orphaned sockets exceeds an administratively configured
  *     limit.
  *  2. If we have strong memory pressure.
+ *  3. If our net namespace is exiting.
  */
 static int tcp_out_of_resources(struct sock *sk, bool do_reset)
 {
@@ -83,6 +91,13 @@ static int tcp_out_of_resources(struct sock *sk, bool do_reset)
 		__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY);
 		return 1;
 	}
+
+	if (!check_net(sock_net(sk))) {
+		/* Not possible to send reset; just close */
+		tcp_done(sk);
+		return 1;
+	}
+
 	return 0;
 }
 
-- 
2.14.3


From 521bf9f2e2fef6307516af456f0701205ad14700 Mon Sep 17 00:00:00 2001
From: "r.hering@avm.de" <r.hering@avm.de>
Date: Fri, 12 Jan 2018 15:42:06 +0100
Subject: [PATCH 11/37] net/tls: Fix inverted error codes to avoid endless loop

[ Upstream commit 30be8f8dba1bd2aff73e8447d59228471233a3d4 ]

sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().

Signed-off-by: Robert Hering <r.hering@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/tls.h | 2 +-
 net/tls/tls_sw.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index c06db1eadac2..df950383b8c1 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -168,7 +168,7 @@ static inline bool tls_is_pending_open_record(struct tls_context *tls_ctx)
 
 static inline void tls_err_abort(struct sock *sk)
 {
-	sk->sk_err = -EBADMSG;
+	sk->sk_err = EBADMSG;
 	sk->sk_error_report(sk);
 }
 
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f00383a37622..81cc4325682f 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -407,7 +407,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 
 	while (msg_data_left(msg)) {
 		if (sk->sk_err) {
-			ret = sk->sk_err;
+			ret = -sk->sk_err;
 			goto send_end;
 		}
 
@@ -560,7 +560,7 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
 		size_t copy, required_size;
 
 		if (sk->sk_err) {
-			ret = sk->sk_err;
+			ret = -sk->sk_err;
 			goto sendpage_end;
 		}
 
-- 
2.14.3


From 44c8029c26be54b005b3813ab8e85f9b0ab804e6 Mon Sep 17 00:00:00 2001
From: David Ahern <dsahern@gmail.com>
Date: Wed, 24 Jan 2018 19:37:37 -0800
Subject: [PATCH 12/37] net: vrf: Add support for sends to local broadcast
 address

[ Upstream commit 1e19c4d689dc1e95bafd23ef68fbc0c6b9e05180 ]

Sukumar reported that sends to the local broadcast address
(255.255.255.255) are broken. Check for the address in vrf driver
and do not redirect to the VRF device - similar to multicast
packets.

With this change sockets can use SO_BINDTODEVICE to specify an
egress interface and receive responses. Note: the egress interface
can not be a VRF device but needs to be the enslaved device.

https://bugzilla.kernel.org/show_bug.cgi?id=198521

Reported-by: Sukumar Gopalakrishnan <sukumarg1973@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/vrf.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 7dc3bcac3506..67ecf2425b88 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -674,8 +674,9 @@ static struct sk_buff *vrf_ip_out(struct net_device *vrf_dev,
 				  struct sock *sk,
 				  struct sk_buff *skb)
 {
-	/* don't divert multicast */
-	if (ipv4_is_multicast(ip_hdr(skb)->daddr))
+	/* don't divert multicast or local broadcast */
+	if (ipv4_is_multicast(ip_hdr(skb)->daddr) ||
+	    ipv4_is_lbcast(ip_hdr(skb)->daddr))
 		return skb;
 
 	if (qdisc_tx_is_default(vrf_dev))
-- 
2.14.3


From 42d122fd6d6d1d2a2e8e0802fe731a1b01ebb699 Mon Sep 17 00:00:00 2001
From: Guillaume Nault <g.nault@alphalink.fr>
Date: Mon, 22 Jan 2018 18:06:37 +0100
Subject: [PATCH 13/37] pppoe: take ->needed_headroom of lower device into
 account on xmit

[ Upstream commit 02612bb05e51df8489db5e94d0cf8d1c81f87b0c ]

In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
was probably fine before the introduction of ->needed_headroom in
commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom").

But now, virtual devices typically advertise the size of their overhead
in dev->needed_headroom, so we must also take it into account in
skb_reserve().
Allocation size of skb is also updated to take dev->needed_tailroom
into account and replace the arbitrary 32 bytes with the real size of
a PPPoE header.

This issue was discovered by syzbot, who connected a pppoe socket to a
gre device which had dev->header_ops->create == ipgre_header and
dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
headroom, and dev_hard_header() crashed when ipgre_header() tried to
prepend its header to skb->data.

skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
4.15.0-rc7-next-20180115+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
FS:  00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  skb_under_panic net/core/skbuff.c:114 [inline]
  skb_push+0xce/0xf0 net/core/skbuff.c:1714
  ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
  dev_hard_header include/linux/netdevice.h:2723 [inline]
  pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
  sock_sendmsg_nosec net/socket.c:630 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:640
  sock_write_iter+0x31a/0x5d0 net/socket.c:909
  call_write_iter include/linux/fs.h:1775 [inline]
  do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
  do_iter_write+0x154/0x540 fs/read_write.c:932
  vfs_writev+0x18a/0x340 fs/read_write.c:977
  do_writev+0xfc/0x2a0 fs/read_write.c:1012
  SYSC_writev fs/read_write.c:1085 [inline]
  SyS_writev+0x27/0x30 fs/read_write.c:1082
  entry_SYSCALL_64_fastpath+0x29/0xa0

Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
interfaces, but reserving space for ->needed_headroom is a more
fundamental issue that needs to be addressed first.

Same problem exists for __pppoe_xmit(), which also needs to take
dev->needed_headroom into account in skb_cow_head().

Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")
Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ppp/pppoe.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 4e1da1645b15..5aa59f41bf8c 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -842,6 +842,7 @@ static int pppoe_sendmsg(struct socket *sock, struct msghdr *m,
 	struct pppoe_hdr *ph;
 	struct net_device *dev;
 	char *start;
+	int hlen;
 
 	lock_sock(sk);
 	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) {
@@ -860,16 +861,16 @@ static int pppoe_sendmsg(struct socket *sock, struct msghdr *m,
 	if (total_len > (dev->mtu + dev->hard_header_len))
 		goto end;
 
-
-	skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32,
-			   0, GFP_KERNEL);
+	hlen = LL_RESERVED_SPACE(dev);
+	skb = sock_wmalloc(sk, hlen + sizeof(*ph) + total_len +
+			   dev->needed_tailroom, 0, GFP_KERNEL);
 	if (!skb) {
 		error = -ENOMEM;
 		goto end;
 	}
 
 	/* Reserve space for headers. */
-	skb_reserve(skb, dev->hard_header_len);
+	skb_reserve(skb, hlen);
 	skb_reset_network_header(skb);
 
 	skb->dev = dev;
@@ -930,7 +931,7 @@ static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
 	/* Copy the data if there is no space for the header or if it's
 	 * read-only.
 	 */
-	if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len))
+	if (skb_cow_head(skb, LL_RESERVED_SPACE(dev) + sizeof(*ph)))
 		goto abort;
 
 	__skb_push(skb, sizeof(*ph));
-- 
2.14.3


From 72a94c0565fdda749a3872bad5cdd941628f0f02 Mon Sep 17 00:00:00 2001
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Fri, 26 Jan 2018 01:53:26 +0100
Subject: [PATCH 14/37] r8169: fix memory corruption on retrieval of hardware
 statistics.

[ Upstream commit a78e93661c5fd30b9e1dee464b2f62f966883ef7 ]

Hardware statistics retrieval hurts in tight invocation loops.

Avoid extraneous write and enforce strict ordering of writes targeted to
the tally counters dump area address registers.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Oliver Freyermuth <o.freyermuth@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/realtek/r8169.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 9541465e43e9..958ff931e790 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2239,19 +2239,14 @@ static bool rtl8169_do_counters(struct net_device *dev, u32 counter_cmd)
 	void __iomem *ioaddr = tp->mmio_addr;
 	dma_addr_t paddr = tp->counters_phys_addr;
 	u32 cmd;
-	bool ret;
 
 	RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
+	RTL_R32(CounterAddrHigh);
 	cmd = (u64)paddr & DMA_BIT_MASK(32);
 	RTL_W32(CounterAddrLow, cmd);
 	RTL_W32(CounterAddrLow, cmd | counter_cmd);
 
-	ret = rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
-
-	RTL_W32(CounterAddrLow, 0);
-	RTL_W32(CounterAddrHigh, 0);
-
-	return ret;
+	return rtl_udelay_loop_wait_low(tp, &rtl_counters_cond, 10, 1000);
 }
 
 static bool rtl8169_reset_counters(struct net_device *dev)
-- 
2.14.3


From 628a0e1991343859c4bb1b69e3bb84867c32d1ac Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 15 Jan 2018 17:02:00 +0800
Subject: [PATCH 15/37] sctp: do not allow the v4 socket to bind a v4mapped v6
 address

[ Upstream commit c5006b8aa74599ce19104b31d322d2ea9ff887cc ]

The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.

The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.

This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.

Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/socket.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index df806b8819aa..7b258bf60eca 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -334,16 +334,14 @@ static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
 	if (len < sizeof (struct sockaddr))
 		return NULL;
 
+	if (!opt->pf->af_supported(addr->sa.sa_family, opt))
+		return NULL;
+
 	/* V4 mapped address are really of AF_INET family */
 	if (addr->sa.sa_family == AF_INET6 &&
-	    ipv6_addr_v4mapped(&addr->v6.sin6_addr)) {
-		if (!opt->pf->af_supported(AF_INET, opt))
-			return NULL;
-	} else {
-		/* Does this PF support this AF? */
-		if (!opt->pf->af_supported(addr->sa.sa_family, opt))
-			return NULL;
-	}
+	    ipv6_addr_v4mapped(&addr->v6.sin6_addr) &&
+	    !opt->pf->af_supported(AF_INET, opt))
+		return NULL;
 
 	/* If we get this far, af is valid. */
 	af = sctp_get_af_specific(addr->sa.sa_family);
-- 
2.14.3


From 7e2aceceda0e82e70911d319f728afcc4217b89f Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 15 Jan 2018 17:01:36 +0800
Subject: [PATCH 16/37] sctp: return error if the asoc has been peeled off in
 sctp_wait_for_sndbuf

[ Upstream commit a0ff660058b88d12625a783ce9e5c1371c87951f ]

After commit cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.

However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.

This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.

Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/socket.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7b258bf60eca..7d432266645c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -84,7 +84,7 @@
 static int sctp_writeable(struct sock *sk);
 static void sctp_wfree(struct sk_buff *skb);
 static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
-				size_t msg_len, struct sock **orig_sk);
+				size_t msg_len);
 static int sctp_wait_for_packet(struct sock *sk, int *err, long *timeo_p);
 static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p);
 static int sctp_wait_for_accept(struct sock *sk, long timeo);
@@ -1961,7 +1961,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 	timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
 	if (!sctp_wspace(asoc)) {
 		/* sk can be changed by peel off when waiting for buf. */
-		err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len, &sk);
+		err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
 		if (err) {
 			if (err == -ESRCH) {
 				/* asoc is already dead. */
@@ -7825,12 +7825,12 @@ void sctp_sock_rfree(struct sk_buff *skb)
 
 /* Helper function to wait for space in the sndbuf.  */
 static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
-				size_t msg_len, struct sock **orig_sk)
+				size_t msg_len)
 {
 	struct sock *sk = asoc->base.sk;
-	int err = 0;
 	long current_timeo = *timeo_p;
 	DEFINE_WAIT(wait);
+	int err = 0;
 
 	pr_debug("%s: asoc:%p, timeo:%ld, msg_len:%zu\n", __func__, asoc,
 		 *timeo_p, msg_len);
@@ -7859,17 +7859,13 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
 		release_sock(sk);
 		current_timeo = schedule_timeout(current_timeo);
 		lock_sock(sk);
-		if (sk != asoc->base.sk) {
-			release_sock(sk);
-			sk = asoc->base.sk;
-			lock_sock(sk);
-		}
+		if (sk != asoc->base.sk)
+			goto do_error;
 
 		*timeo_p = current_timeo;
 	}
 
 out:
-	*orig_sk = sk;
 	finish_wait(&asoc->wait, &wait);
 
 	/* Release the association's refcnt.  */
-- 
2.14.3


From afa2e86079a015c626268899f476e94f3c4338a6 Mon Sep 17 00:00:00 2001
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 10 Jan 2018 12:50:25 -0800
Subject: [PATCH 17/37] tipc: fix a memory leak in tipc_nl_node_get_link()

[ Upstream commit 59b36613e85fb16ebf9feaf914570879cd5c2a21 ]

When tipc_node_find_by_name() fails, the nlmsg is not
freed.

While on it, switch to a goto label to properly
free it.

Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/tipc/node.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index 198dbc7adbe1..f6c5743c170e 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1848,36 +1848,38 @@ int tipc_nl_node_get_link(struct sk_buff *skb, struct genl_info *info)
 
 	if (strcmp(name, tipc_bclink_name) == 0) {
 		err = tipc_nl_add_bc_link(net, &msg);
-		if (err) {
-			nlmsg_free(msg.skb);
-			return err;
-		}
+		if (err)
+			goto err_free;
 	} else {
 		int bearer_id;
 		struct tipc_node *node;
 		struct tipc_link *link;
 
 		node = tipc_node_find_by_name(net, name, &bearer_id);
-		if (!node)
-			return -EINVAL;
+		if (!node) {
+			err = -EINVAL;
+			goto err_free;
+		}
 
 		tipc_node_read_lock(node);
 		link = node->links[bearer_id].link;
 		if (!link) {
 			tipc_node_read_unlock(node);
-			nlmsg_free(msg.skb);
-			return -EINVAL;
+			err = -EINVAL;
+			goto err_free;
 		}
 
 		err = __tipc_nl_add_link(net, &msg, link, 0);
 		tipc_node_read_unlock(node);
-		if (err) {
-			nlmsg_free(msg.skb);
-			return err;
-		}
+		if (err)
+			goto err_free;
 	}
 
 	return genlmsg_reply(msg.skb, info);
+
+err_free:
+	nlmsg_free(msg.skb);
+	return err;
 }
 
 int tipc_nl_node_reset_link_stats(struct sk_buff *skb, struct genl_info *info)
-- 
2.14.3


From f02768ad4004f7ffc36e66cae89277a3a5a13a90 Mon Sep 17 00:00:00 2001
From: Eran Ben Elisha <eranbe@mellanox.com>
Date: Tue, 9 Jan 2018 11:41:10 +0200
Subject: [PATCH 18/37] {net,ib}/mlx5: Don't disable local loopback multicast
 traffic when needed

[ Upstream commit 8978cc921fc7fad3f4d6f91f1da01352aeeeff25 ]

There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.

Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)

Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.

Adapt mlx5_ib and mlx5e to support the new cap bits.

Fixes: 2c43c5a036be ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb432 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c                  |  9 +++++---
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c  | 27 ++++++++++++++--------
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  3 +--
 drivers/net/ethernet/mellanox/mlx5/core/vport.c    | 22 +++++++++++++-----
 include/linux/mlx5/mlx5_ifc.h                      |  5 ++--
 5 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 30d479f87cb8..fb5302ee57c7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1276,7 +1276,8 @@ static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn)
 		return err;
 
 	if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) ||
-	    !MLX5_CAP_GEN(dev->mdev, disable_local_lb))
+	    (!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) &&
+	     !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
 		return err;
 
 	mutex_lock(&dev->lb_mutex);
@@ -1294,7 +1295,8 @@ static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn)
 	mlx5_core_dealloc_transport_domain(dev->mdev, tdn);
 
 	if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) ||
-	    !MLX5_CAP_GEN(dev->mdev, disable_local_lb))
+	    (!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) &&
+	     !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
 		return;
 
 	mutex_lock(&dev->lb_mutex);
@@ -4161,7 +4163,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	}
 
 	if ((MLX5_CAP_GEN(mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
-	    MLX5_CAP_GEN(mdev, disable_local_lb))
+	    (MLX5_CAP_GEN(mdev, disable_local_lb_uc) ||
+	     MLX5_CAP_GEN(mdev, disable_local_lb_mc)))
 		mutex_init(&dev->lb_mutex);
 
 	dev->ib_active = true;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 1f1f8af87d4d..5a4608281f38 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -238,15 +238,19 @@ static int mlx5e_test_loopback_setup(struct mlx5e_priv *priv,
 	int err = 0;
 
 	/* Temporarily enable local_lb */
-	if (MLX5_CAP_GEN(priv->mdev, disable_local_lb)) {
-		mlx5_nic_vport_query_local_lb(priv->mdev, &lbtp->local_lb);
-		if (!lbtp->local_lb)
-			mlx5_nic_vport_update_local_lb(priv->mdev, true);
+	err = mlx5_nic_vport_query_local_lb(priv->mdev, &lbtp->local_lb);
+	if (err)
+		return err;
+
+	if (!lbtp->local_lb) {
+		err = mlx5_nic_vport_update_local_lb(priv->mdev, true);
+		if (err)
+			return err;
 	}
 
 	err = mlx5e_refresh_tirs(priv, true);
 	if (err)
-		return err;
+		goto out;
 
 	lbtp->loopback_ok = false;
 	init_completion(&lbtp->comp);
@@ -256,16 +260,21 @@ static int mlx5e_test_loopback_setup(struct mlx5e_priv *priv,
 	lbtp->pt.dev = priv->netdev;
 	lbtp->pt.af_packet_priv = lbtp;
 	dev_add_pack(&lbtp->pt);
+
+	return 0;
+
+out:
+	if (!lbtp->local_lb)
+		mlx5_nic_vport_update_local_lb(priv->mdev, false);
+
 	return err;
 }
 
 static void mlx5e_test_loopback_cleanup(struct mlx5e_priv *priv,
 					struct mlx5e_lbt_priv *lbtp)
 {
-	if (MLX5_CAP_GEN(priv->mdev, disable_local_lb)) {
-		if (!lbtp->local_lb)
-			mlx5_nic_vport_update_local_lb(priv->mdev, false);
-	}
+	if (!lbtp->local_lb)
+		mlx5_nic_vport_update_local_lb(priv->mdev, false);
 
 	dev_remove_pack(&lbtp->pt);
 	mlx5e_refresh_tirs(priv, false);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 8bfc37e4ec87..4ddd632d10f9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -577,8 +577,7 @@ static int mlx5_core_set_hca_defaults(struct mlx5_core_dev *dev)
 	int ret = 0;
 
 	/* Disable local_lb by default */
-	if ((MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
-	    MLX5_CAP_GEN(dev, disable_local_lb))
+	if (MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH)
 		ret = mlx5_nic_vport_update_local_lb(dev, false);
 
 	return ret;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vport.c b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
index d653b0025b13..a1296a62497d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/vport.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/vport.c
@@ -908,23 +908,33 @@ int mlx5_nic_vport_update_local_lb(struct mlx5_core_dev *mdev, bool enable)
 	void *in;
 	int err;
 
-	mlx5_core_dbg(mdev, "%s local_lb\n", enable ? "enable" : "disable");
+	if (!MLX5_CAP_GEN(mdev, disable_local_lb_mc) &&
+	    !MLX5_CAP_GEN(mdev, disable_local_lb_uc))
+		return 0;
+
 	in = kvzalloc(inlen, GFP_KERNEL);
 	if (!in)
 		return -ENOMEM;
 
-	MLX5_SET(modify_nic_vport_context_in, in,
-		 field_select.disable_mc_local_lb, 1);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 nic_vport_context.disable_mc_local_lb, !enable);
-
-	MLX5_SET(modify_nic_vport_context_in, in,
-		 field_select.disable_uc_local_lb, 1);
 	MLX5_SET(modify_nic_vport_context_in, in,
 		 nic_vport_context.disable_uc_local_lb, !enable);
 
+	if (MLX5_CAP_GEN(mdev, disable_local_lb_mc))
+		MLX5_SET(modify_nic_vport_context_in, in,
+			 field_select.disable_mc_local_lb, 1);
+
+	if (MLX5_CAP_GEN(mdev, disable_local_lb_uc))
+		MLX5_SET(modify_nic_vport_context_in, in,
+			 field_select.disable_uc_local_lb, 1);
+
 	err = mlx5_modify_nic_vport_context(mdev, in, inlen);
 
+	if (!err)
+		mlx5_core_dbg(mdev, "%s local_lb\n",
+			      enable ? "enable" : "disable");
+
 	kvfree(in);
 	return err;
 }
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index c8091f06eaa4..f3765155fa4d 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1023,8 +1023,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         log_max_wq_sz[0x5];
 
 	u8         nic_vport_change_event[0x1];
-	u8         disable_local_lb[0x1];
-	u8         reserved_at_3e2[0x9];
+	u8         disable_local_lb_uc[0x1];
+	u8         disable_local_lb_mc[0x1];
+	u8         reserved_at_3e3[0x8];
 	u8         log_max_vlan_list[0x5];
 	u8         reserved_at_3f0[0x3];
 	u8         log_max_current_mc_list[0x5];
-- 
2.14.3


From 8d6faa3bee449366aa916e84b1fff702da5e0535 Mon Sep 17 00:00:00 2001
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 4 Jan 2018 04:35:51 +0200
Subject: [PATCH 19/37] net/mlx5: Fix get vector affinity helper function

[ Upstream commit 05e0cc84e00c54fb152d1f4b86bc211823a83d0c ]

mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
mlx5 users.  To fix this, this patch provides an alternative way to
retrieve IRQ vector affinity using legacy IRQ API, following
smp_affinity read procfs implementation.

Fixes: 231243c82793 ("Revert mlx5: move affinity hints assignments to generic code")
Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/driver.h | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 8b3d0103c03a..a13525daf09b 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -36,6 +36,7 @@
 #include <linux/kernel.h>
 #include <linux/completion.h>
 #include <linux/pci.h>
+#include <linux/irq.h>
 #include <linux/spinlock_types.h>
 #include <linux/semaphore.h>
 #include <linux/slab.h>
@@ -1194,7 +1195,23 @@ enum {
 static inline const struct cpumask *
 mlx5_get_vector_affinity(struct mlx5_core_dev *dev, int vector)
 {
-	return pci_irq_get_affinity(dev->pdev, MLX5_EQ_VEC_COMP_BASE + vector);
+	const struct cpumask *mask;
+	struct irq_desc *desc;
+	unsigned int irq;
+	int eqn;
+	int err;
+
+	err = mlx5_vector2eqn(dev, vector, &eqn, &irq);
+	if (err)
+		return NULL;
+
+	desc = irq_to_desc(irq);
+#ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
+	mask = irq_data_get_effective_affinity_mask(&desc->irq_data);
+#else
+	mask = desc->irq_common_data.affinity;
+#endif
+	return mask;
 }
 
 #endif /* MLX5_DRIVER_H */
-- 
2.14.3


From 6c5b2e5598df7896f7f4fba1e466f81493e5eaac Mon Sep 17 00:00:00 2001
From: Guillaume Nault <g.nault@alphalink.fr>
Date: Wed, 10 Jan 2018 16:24:45 +0100
Subject: [PATCH 20/37] ppp: unlock all_ppp_mutex before registering device

[ Upstream commit 0171c41835591e9aa2e384b703ef9a6ae367c610 ]

ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
needs to lock pn->all_ppp_mutex. Therefore we mustn't call
register_netdevice() with pn->all_ppp_mutex already locked, or we'd
deadlock in case register_netdevice() fails and calls .ndo_uninit().

Fortunately, we can unlock pn->all_ppp_mutex before calling
register_netdevice(). This lock protects pn->units_idr, which isn't
used in the device registration process.

However, keeping pn->all_ppp_mutex locked during device registration
did ensure that no device in transient state would be published in
pn->units_idr. In practice, unlocking it before calling
register_netdevice() doesn't change this property: ppp_unit_register()
is called with 'ppp_mutex' locked and all searches done in
pn->units_idr hold this lock too.

Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ppp/ppp_generic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index bf14c51f35e1..8c6b8918ec31 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1003,17 +1003,18 @@ static int ppp_unit_register(struct ppp *ppp, int unit, bool ifname_is_set)
 	if (!ifname_is_set)
 		snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
 
+	mutex_unlock(&pn->all_ppp_mutex);
+
 	ret = register_netdevice(ppp->dev);
 	if (ret < 0)
 		goto err_unit;
 
 	atomic_inc(&ppp_unit_count);
 
-	mutex_unlock(&pn->all_ppp_mutex);
-
 	return 0;
 
 err_unit:
+	mutex_lock(&pn->all_ppp_mutex);
 	unit_put(&pn->units_idr, ppp->file.index);
 err:
 	mutex_unlock(&pn->all_ppp_mutex);
-- 
2.14.3


From 528a555a2aca08e4a7cb357341665bbb939e6eb9 Mon Sep 17 00:00:00 2001
From: Ivan Vecera <cera@cera.cz>
Date: Fri, 19 Jan 2018 20:23:50 +0100
Subject: [PATCH 21/37] be2net: restore properly promisc mode after queues
 reconfiguration

[ Upstream commit 52acf06451930eb4cefabd5ecea56e2d46c32f76 ]

The commit 622190669403 ("be2net: Request RSS capability of Rx interface
depending on number of Rx rings") modified be_update_queues() so the
IFACE (HW representation of the netdevice) is destroyed and then
re-created. This causes a regression because potential promiscuous mode
is not restored properly during be_open() because the driver thinks
that the HW has promiscuous mode already enabled.

Note that Lancer is not affected by this bug because RX-filter flags are
disabled during be_close() for this chipset.

Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>

Fixes: 622190669403 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 0e3d9f39a807..1b03c32afc1f 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4634,6 +4634,15 @@ int be_update_queues(struct be_adapter *adapter)
 
 	be_schedule_worker(adapter);
 
+	/*
+	 * The IF was destroyed and re-created. We need to clear
+	 * all promiscuous flags valid for the destroyed IF.
+	 * Without this promisc mode is not restored during
+	 * be_open() because the driver thinks that it is
+	 * already enabled in HW.
+	 */
+	adapter->if_flags &= ~BE_IF_FLAGS_ALL_PROMISCUOUS;
+
 	if (netif_running(netdev))
 		status = be_open(netdev);
 
-- 
2.14.3


From a1ab92e8b25aa4d83232fa4ab7f0616497994671 Mon Sep 17 00:00:00 2001
From: Alexey Kodanev <alexey.kodanev@oracle.com>
Date: Thu, 18 Jan 2018 20:51:12 +0300
Subject: [PATCH 22/37] ip6_gre: init dev->mtu and dev->hard_header_len
 correctly

[ Upstream commit 128bb975dc3c25d00de04e503e2fe0a780d04459 ]

Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path,
call common GRE functions") moved dev->mtu initialization
from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
result, the previously set values, before ndo_init(), are
reset in the following cases:

* rtnl_create_link() can update dev->mtu from IFLA_MTU
  parameter.

* ip6gre_tnl_link_config() is invoked before ndo_init() in
  netlink and ioctl setup, so ndo_init() can reset MTU
  adjustments with the lower device MTU as well, dev->mtu
  and dev->hard_header_len.

  Not applicable for ip6gretap because it has one more call
  to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().

Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
parameter if a user sets it manually on a device creation,
and fix the second one by moving ip6gre_tnl_link_config()
call after register_netdevice().

Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ip6_gre.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 7a2df6646486..5b4870caf268 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -337,11 +337,12 @@ static struct ip6_tnl *ip6gre_tunnel_locate(struct net *net,
 
 	nt->dev = dev;
 	nt->net = dev_net(dev);
-	ip6gre_tnl_link_config(nt, 1);
 
 	if (register_netdevice(dev) < 0)
 		goto failed_free;
 
+	ip6gre_tnl_link_config(nt, 1);
+
 	/* Can use a lockless transmit, unless we generate output sequences */
 	if (!(nt->parms.o_flags & TUNNEL_SEQ))
 		dev->features |= NETIF_F_LLTX;
@@ -1307,7 +1308,6 @@ static void ip6gre_netlink_parms(struct nlattr *data[],
 
 static int ip6gre_tap_init(struct net_device *dev)
 {
-	struct ip6_tnl *tunnel;
 	int ret;
 
 	ret = ip6gre_tunnel_init_common(dev);
@@ -1316,10 +1316,6 @@ static int ip6gre_tap_init(struct net_device *dev)
 
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 
-	tunnel = netdev_priv(dev);
-
-	ip6gre_tnl_link_config(tunnel, 1);
-
 	return 0;
 }
 
@@ -1411,12 +1407,16 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev,
 
 	nt->dev = dev;
 	nt->net = dev_net(dev);
-	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
 
 	err = register_netdevice(dev);
 	if (err)
 		goto out;
 
+	ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]);
+
+	if (tb[IFLA_MTU])
+		ip6_tnl_change_mtu(dev, nla_get_u32(tb[IFLA_MTU]));
+
 	dev_hold(dev);
 	ip6gre_tunnel_link(ign, nt);
 
-- 
2.14.3


From 077f7eb2f8ae33387d0bfad8e8cfdd5de60ab38b Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb@google.com>
Date: Fri, 19 Jan 2018 09:29:18 -0500
Subject: [PATCH 23/37] gso: validate gso_type in GSO handlers

[ Upstream commit 121d57af308d0cf943f08f4738d24d3966c38cd9 ]

Validate gso_type during segmentation as SKB_GSO_DODGY sources
may pass packets where the gso_type does not match the contents.

Syzkaller was able to enter the SCTP gso handler with a packet of
gso_type SKB_GSO_TCPV4.

On entry of transport layer gso handlers, verify that the gso_type
matches the transport protocol.

Fixes: 90017accff61 ("sctp: Add GSO support")
Link: http://lkml.kernel.org/r/<001a1137452496ffc305617e5fe0@google.com>
Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/esp4_offload.c  | 3 +++
 net/ipv4/tcp_offload.c   | 3 +++
 net/ipv4/udp_offload.c   | 3 +++
 net/ipv6/esp6_offload.c  | 3 +++
 net/ipv6/tcpv6_offload.c | 3 +++
 net/ipv6/udp_offload.c   | 3 +++
 net/sctp/offload.c       | 3 +++
 7 files changed, 21 insertions(+)

diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index f8b918c766b0..56c49623bb9d 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -121,6 +121,9 @@ static struct sk_buff *esp4_gso_segment(struct sk_buff *skb,
 	if (!xo)
 		goto out;
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_ESP))
+		goto out;
+
 	seq = xo->seq.low;
 
 	x = skb->sp->xvec[skb->sp->len - 1];
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index b6a2aa1dcf56..4d58e2ce0b5b 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -32,6 +32,9 @@ static void tcp_gso_tstamp(struct sk_buff *skb, unsigned int ts_seq,
 static struct sk_buff *tcp4_gso_segment(struct sk_buff *skb,
 					netdev_features_t features)
 {
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4))
+		return ERR_PTR(-EINVAL);
+
 	if (!pskb_may_pull(skb, sizeof(struct tcphdr)))
 		return ERR_PTR(-EINVAL);
 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 01801b77bd0d..ea6e6e7df0ee 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -203,6 +203,9 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 		goto out;
 	}
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP))
+		goto out;
+
 	if (!pskb_may_pull(skb, sizeof(struct udphdr)))
 		goto out;
 
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index 333a478aa161..1ea9d794447e 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -148,6 +148,9 @@ static struct sk_buff *esp6_gso_segment(struct sk_buff *skb,
 	if (!xo)
 		goto out;
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_ESP))
+		goto out;
+
 	seq = xo->seq.low;
 
 	x = skb->sp->xvec[skb->sp->len - 1];
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index d883c9204c01..278e49cd67d4 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -46,6 +46,9 @@ static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb,
 {
 	struct tcphdr *th;
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6))
+		return ERR_PTR(-EINVAL);
+
 	if (!pskb_may_pull(skb, sizeof(*th)))
 		return ERR_PTR(-EINVAL);
 
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index a0f89ad76f9d..2a04dc9c781b 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -42,6 +42,9 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 		const struct ipv6hdr *ipv6h;
 		struct udphdr *uh;
 
+		if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP))
+			goto out;
+
 		if (!pskb_may_pull(skb, sizeof(struct udphdr)))
 			goto out;
 
diff --git a/net/sctp/offload.c b/net/sctp/offload.c
index 275925b93b29..35bc7106d182 100644
--- a/net/sctp/offload.c
+++ b/net/sctp/offload.c
@@ -45,6 +45,9 @@ static struct sk_buff *sctp_gso_segment(struct sk_buff *skb,
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct sctphdr *sh;
 
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_SCTP))
+		goto out;
+
 	sh = sctp_hdr(skb);
 	if (!pskb_may_pull(skb, sizeof(*sh)))
 		goto out;
-- 
2.14.3


From 4629464d5d5afda23eb202bb6b8efc25080c895b Mon Sep 17 00:00:00 2001
From: Yuval Mintz <yuvalm@mellanox.com>
Date: Wed, 24 Jan 2018 10:02:09 +0100
Subject: [PATCH 24/37] mlxsw: spectrum_router: Don't log an error on missing
 neighbor

[ Upstream commit 1ecdaea02ca6bfacf2ecda500dc1af51e9780c42 ]

Driver periodically samples all neighbors configured in device
in order to update the kernel regarding their state. When finding
an entry configured in HW that doesn't show in neigh_lookup()
driver logs an error message.
This introduces a race when removing multiple neighbors -
it's possible that a given entry would still be configured in HW
as its removal is still being processed but is already removed
from the kernel's neighbor tables.

Simply remove the error message and gracefully accept such events.

Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Fixes: 60f040ca11b9 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index c23cc51bb5a5..7bef80676464 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -1531,11 +1531,8 @@ static void mlxsw_sp_router_neigh_ent_ipv4_process(struct mlxsw_sp *mlxsw_sp,
 	dipn = htonl(dip);
 	dev = mlxsw_sp->router->rifs[rif]->dev;
 	n = neigh_lookup(&arp_tbl, &dipn, dev);
-	if (!n) {
-		netdev_err(dev, "Failed to find matching neighbour for IP=%pI4h\n",
-			   &dip);
+	if (!n)
 		return;
-	}
 
 	netdev_dbg(dev, "Updating neighbour with IP=%pI4h\n", &dip);
 	neigh_event_send(n, NULL);
@@ -1562,11 +1559,8 @@ static void mlxsw_sp_router_neigh_ent_ipv6_process(struct mlxsw_sp *mlxsw_sp,
 
 	dev = mlxsw_sp->router->rifs[rif]->dev;
 	n = neigh_lookup(&nd_tbl, &dip, dev);
-	if (!n) {
-		netdev_err(dev, "Failed to find matching neighbour for IP=%pI6c\n",
-			   &dip);
+	if (!n)
 		return;
-	}
 
 	netdev_dbg(dev, "Updating neighbour with IP=%pI6c\n", &dip);
 	neigh_event_send(n, NULL);
-- 
2.14.3


From 3d4f93fbb5dfe876f4ac56f4125feea590587ec9 Mon Sep 17 00:00:00 2001
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 15 Jan 2018 11:37:29 -0800
Subject: [PATCH 25/37] tun: fix a memory leak for tfile->tx_array

[ Upstream commit 4df0bfc79904b7169dc77dcce44598b1545721f9 ]

tfile->tun could be detached before we close the tun fd,
via tun_detach_all(), so it should not be used to check for
tfile->tx_array.

As Jason suggested, we probably have to clean it up
unconditionally both in __tun_deatch() and tun_detach_all(),
but this requires to check if it is initialized or not.
Currently skb_array_cleanup() doesn't have such a check,
so I check it in the caller and introduce a helper function,
it is a bit ugly but we can always improve it in net-next.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/tun.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c91b110f2169..fa51b7b0e9ea 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -534,6 +534,14 @@ static void tun_queue_purge(struct tun_file *tfile)
 	skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
+static void tun_cleanup_tx_array(struct tun_file *tfile)
+{
+	if (tfile->tx_array.ring.queue) {
+		skb_array_cleanup(&tfile->tx_array);
+		memset(&tfile->tx_array, 0, sizeof(tfile->tx_array));
+	}
+}
+
 static void __tun_detach(struct tun_file *tfile, bool clean)
 {
 	struct tun_file *ntfile;
@@ -575,8 +583,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 			    tun->dev->reg_state == NETREG_REGISTERED)
 				unregister_netdevice(tun->dev);
 		}
-		if (tun)
-			skb_array_cleanup(&tfile->tx_array);
+		tun_cleanup_tx_array(tfile);
 		sock_put(&tfile->sk);
 	}
 }
@@ -616,11 +623,13 @@ static void tun_detach_all(struct net_device *dev)
 		/* Drop read queue */
 		tun_queue_purge(tfile);
 		sock_put(&tfile->sk);
+		tun_cleanup_tx_array(tfile);
 	}
 	list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
 		tun_enable_queue(tfile);
 		tun_queue_purge(tfile);
 		sock_put(&tfile->sk);
+		tun_cleanup_tx_array(tfile);
 	}
 	BUG_ON(tun->numdisabled != 0);
 
@@ -2624,6 +2633,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
 
 	sock_set_flag(&tfile->sk, SOCK_ZEROCOPY);
 
+	memset(&tfile->tx_array, 0, sizeof(tfile->tx_array));
+
 	return 0;
 }
 
-- 
2.14.3


From d699289b05fc98fa2ade4ee5d3072241aed92bcc Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 17 Jan 2018 14:21:13 -0800
Subject: [PATCH 26/37] flow_dissector: properly cap thoff field

[ Upstream commit d0c081b49137cd3200f2023c0875723be66e7ce5 ]

syzbot reported yet another crash [1] that is caused by
insufficient validation of DODGY packets.

Two bugs are happening here to trigger the crash.

1) Flow dissection leaves with incorrect thoff field.

2) skb_probe_transport_header() sets transport header to this invalid
thoff, even if pointing after skb valid data.

3) qdisc_pkt_len_init() reads out-of-bound data because it
trusts tcp_hdrlen(skb)

Possible fixes :

- Full flow dissector validation before injecting bad DODGY packets in
the stack.
 This approach was attempted here : https://patchwork.ozlabs.org/patch/
861874/

- Have more robust functions in the core.
  This might be needed anyway for stable versions.

This patch fixes the flow dissection issue.

[1]
CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x257 lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:355 [inline]
 kasan_report+0x23b/0x360 mm/kasan/report.c:413
 __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
 __tcp_hdrlen include/linux/tcp.h:35 [inline]
 tcp_hdrlen include/linux/tcp.h:40 [inline]
 qdisc_pkt_len_init net/core/dev.c:3160 [inline]
 __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
 packet_snd net/packet/af_packet.c:2943 [inline]
 packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
 sock_sendmsg_nosec net/socket.c:628 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:638
 sock_write_iter+0x31a/0x5d0 net/socket.c:907
 call_write_iter include/linux/fs.h:1776 [inline]
 new_sync_write fs/read_write.c:469 [inline]
 __vfs_write+0x684/0x970 fs/read_write.c:482
 vfs_write+0x189/0x510 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0xef/0x220 fs/read_write.c:581
 entry_SYSCALL_64_fastpath+0x1f/0x96

Fixes: 34fad54c2537 ("net: __skb_flow_dissect() must cap its return value")
Fixes: a6e544b0a88b ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/flow_dissector.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 0a977373d003..f950b80c0dd1 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -876,8 +876,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 out_good:
 	ret = true;
 
-	key_control->thoff = (u16)nhoff;
 out:
+	key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen);
 	key_basic->n_proto = proto;
 	key_basic->ip_proto = ip_proto;
 
@@ -885,7 +885,6 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 
 out_bad:
 	ret = false;
-	key_control->thoff = min_t(u16, nhoff, skb ? skb->len : hlen);
 	goto out;
 }
 EXPORT_SYMBOL(__skb_flow_dissect);
-- 
2.14.3


From c72a1a54bf44754eed36c8a1d483153479b66daf Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 15 Jan 2018 17:01:19 +0800
Subject: [PATCH 27/37] sctp: reinit stream if stream outcnt has been change by
 sinit in sendmsg

[ Upstream commit 625637bf4afa45204bd87e4218645182a919485a ]

After introducing sctp_stream structure, sctp uses stream->outcnt as the
out stream nums instead of c.sinit_num_ostreams.

However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
in sctp_sendmsg. At that moment, stream->outcnt is still using previous
value. If it's value is not updated, the sinit_num_ostreams of sinit could
not really work.

This patch is to fix it by updating stream->outcnt and reiniting stream
if stream outcnt has been change by sinit in sendmsg.

Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/socket.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7d432266645c..1c08d86efe94 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1880,8 +1880,14 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
 		 */
 		if (sinit) {
 			if (sinit->sinit_num_ostreams) {
-				asoc->c.sinit_num_ostreams =
-					sinit->sinit_num_ostreams;
+				__u16 outcnt = sinit->sinit_num_ostreams;
+
+				asoc->c.sinit_num_ostreams = outcnt;
+				/* outcnt has been changed, so re-init stream */
+				err = sctp_stream_init(&asoc->stream, outcnt, 0,
+						       GFP_KERNEL);
+				if (err)
+					goto out_free;
 			}
 			if (sinit->sinit_max_instreams) {
 				asoc->c.sinit_max_instreams =
-- 
2.14.3


From ecd6e05108602ce98c7347f73d034e1df5dc27a7 Mon Sep 17 00:00:00 2001
From: David Ahern <dsahern@gmail.com>
Date: Wed, 10 Jan 2018 13:00:39 -0800
Subject: [PATCH 28/37] netlink: extack needs to be reset each time through
 loop

[ Upstream commit cbbdf8433a5f117b1a2119ea30fc651b61ef7570 ]

syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.

Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting")
Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netlink/af_netlink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index aac9d68b4636..f4eb3fe59fe5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2393,7 +2393,7 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *,
 						   struct nlmsghdr *,
 						   struct netlink_ext_ack *))
 {
-	struct netlink_ext_ack extack = {};
+	struct netlink_ext_ack extack;
 	struct nlmsghdr *nlh;
 	int err;
 
@@ -2414,6 +2414,7 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *,
 		if (nlh->nlmsg_type < NLMSG_MIN_TYPE)
 			goto ack;
 
+		memset(&extack, 0, sizeof(extack));
 		err = cb(skb, nlh, &extack);
 		if (err == -EINTR)
 			goto skip;
-- 
2.14.3


From 0eb6b30b05f0197d1194431594ee55e2800fd5f4 Mon Sep 17 00:00:00 2001
From: Talat Batheesh <talatb@mellanox.com>
Date: Sun, 21 Jan 2018 05:30:42 +0200
Subject: [PATCH 29/37] net/mlx5e: Fix fixpoint divide exception in
 mlx5e_am_stats_compare

[ Upstream commit e58edaa4863583b54409444f11b4f80dff0af1cd ]

Helmut reported a bug about division by zero while
running traffic and doing physical cable pull test.

When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is division by zero.

This patch prevent this division for both ppms and epms.

Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism")
Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index acf32fe952cd..3d3b1f97dc27 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -197,9 +197,15 @@ static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats *curr,
 		return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER :
 						   MLX5E_AM_STATS_WORSE;
 
+	if (!prev->ppms)
+		return curr->ppms ? MLX5E_AM_STATS_BETTER :
+				    MLX5E_AM_STATS_SAME;
+
 	if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
 		return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER :
 						   MLX5E_AM_STATS_WORSE;
+	if (!prev->epms)
+		return MLX5E_AM_STATS_SAME;
 
 	if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
 		return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
-- 
2.14.3


From 546f2f23d23f65389243eda5f2c1f3cc8680d76a Mon Sep 17 00:00:00 2001
From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Mon, 15 Jan 2018 11:47:53 -0800
Subject: [PATCH 30/37] nfp: use the correct index for link speed table

[ Upstream commit 0d9c9f0f40ca262b67fc06a702b85f3976f5e1a1 ]

sts variable is holding link speed as well as state.  We should
be using ls to index into ls_to_ethtool.

Fixes: 265aeb511bd5 ("nfp: add support for .get_link_ksettings()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index dc016dfec64d..8e623d8fa78e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -306,7 +306,7 @@ nfp_net_get_link_ksettings(struct net_device *netdev,
 	    ls >= ARRAY_SIZE(ls_to_ethtool))
 		return 0;
 
-	cmd->base.speed = ls_to_ethtool[sts];
+	cmd->base.speed = ls_to_ethtool[ls];
 	cmd->base.duplex = DUPLEX_FULL;
 
 	return 0;
-- 
2.14.3


From 61a7b725cbbc6640629bb6b702fcf7769d6a6168 Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 18 Jan 2018 14:48:03 +0800
Subject: [PATCH 31/37] netlink: reset extack earlier in netlink_rcv_skb

[ Upstream commit cd443f1e91ca600a092e780e8250cd6a2954b763 ]

Move up the extack reset/initialization in netlink_rcv_skb, so that
those 'goto ack' will not skip it. Otherwise, later on netlink_ack
may use the uninitialized extack and cause kernel crash.

Fixes: cbbdf8433a5f ("netlink: extack needs to be reset each time through loop")
Reported-by: syzbot+03bee3680a37466775e7@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netlink/af_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index f4eb3fe59fe5..533fd0503ba0 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2400,6 +2400,7 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *,
 	while (skb->len >= nlmsg_total_size(0)) {
 		int msglen;
 
+		memset(&extack, 0, sizeof(extack));
 		nlh = nlmsg_hdr(skb);
 		err = 0;
 
@@ -2414,7 +2415,6 @@ int netlink_rcv_skb(struct sk_buff *skb, int (*cb)(struct sk_buff *,
 		if (nlh->nlmsg_type < NLMSG_MIN_TYPE)
 			goto ack;
 
-		memset(&extack, 0, sizeof(extack));
 		err = cb(skb, nlh, &extack);
 		if (err == -EINTR)
 			goto skip;
-- 
2.14.3


From 4df4e296d2883583f917b9d075817362c938f94b Mon Sep 17 00:00:00 2001
From: Ilya Lesokhin <ilyal@mellanox.com>
Date: Tue, 16 Jan 2018 15:31:52 +0200
Subject: [PATCH 32/37] net/tls: Only attach to sockets in ESTABLISHED state

[ Upstream commit d91c3e17f75f218022140dee18cf515292184a8f ]

Calling accept on a TCP socket with a TLS ulp attached results
in two sockets that share the same ulp context.
The ulp context is freed while a socket is destroyed, so
after one of the sockets is released, the second second will
trigger a use after free when it tries to access the ulp context
attached to it.
We restrict the TLS ulp to sockets in ESTABLISHED state
to prevent the scenario above.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/tls/tls_main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 60aff60e30ad..70757098f79b 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -444,6 +444,15 @@ static int tls_init(struct sock *sk)
 	struct tls_context *ctx;
 	int rc = 0;
 
+	/* The TLS ulp is currently supported only for TCP sockets
+	 * in ESTABLISHED state.
+	 * Supporting sockets in LISTEN state will require us
+	 * to modify the accept implementation to clone rather then
+	 * share the ulp context.
+	 */
+	if (sk->sk_state != TCP_ESTABLISHED)
+		return -ENOTSUPP;
+
 	/* allocate tls context */
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
 	if (!ctx) {
-- 
2.14.3


From 0df56741594f9f719fd5c36adf030a472f0a84b0 Mon Sep 17 00:00:00 2001
From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 16 Jan 2018 16:04:26 +0100
Subject: [PATCH 33/37] tls: fix sw_ctx leak

[ Upstream commit cf6d43ef66f416282121f436ce1bee9a25199d52 ]

During setsockopt(SOL_TCP, TLS_TX), if initialization of the software
context fails in tls_set_sw_offload(), we leak sw_ctx. We also don't
reassign ctx->priv_ctx to NULL, so we can't even do another attempt to
set it up on the same socket, as it will fail with -EEXIST.

Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/tls/tls_sw.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 81cc4325682f..83f886d7c1f8 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -697,18 +697,17 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
 	}
 	default:
 		rc = -EINVAL;
-		goto out;
+		goto free_priv;
 	}
 
 	ctx->prepend_size = TLS_HEADER_SIZE + nonce_size;
 	ctx->tag_size = tag_size;
 	ctx->overhead_size = ctx->prepend_size + ctx->tag_size;
 	ctx->iv_size = iv_size;
-	ctx->iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
-			  GFP_KERNEL);
+	ctx->iv = kmalloc(iv_size + TLS_CIPHER_AES_GCM_128_SALT_SIZE, GFP_KERNEL);
 	if (!ctx->iv) {
 		rc = -ENOMEM;
-		goto out;
+		goto free_priv;
 	}
 	memcpy(ctx->iv, gcm_128_info->salt, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
 	memcpy(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv, iv_size);
@@ -756,7 +755,7 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
 
 	rc = crypto_aead_setauthsize(sw_ctx->aead_send, ctx->tag_size);
 	if (!rc)
-		goto out;
+		return 0;
 
 free_aead:
 	crypto_free_aead(sw_ctx->aead_send);
@@ -767,6 +766,9 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
 free_iv:
 	kfree(ctx->iv);
 	ctx->iv = NULL;
+free_priv:
+	kfree(ctx->priv_ctx);
+	ctx->priv_ctx = NULL;
 out:
 	return rc;
 }
-- 
2.14.3


From f043c4c82c0cec21d8cea70e1c00bbdfa7dcdce4 Mon Sep 17 00:00:00 2001
From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 16 Jan 2018 16:04:27 +0100
Subject: [PATCH 34/37] tls: return -EBUSY if crypto_info is already set

[ Upstream commit 877d17c79b66466942a836403773276e34fe3614 ]

do_tls_setsockopt_tx returns 0 without doing anything when crypto_info
is already set. Silent failure is confusing for users.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/tls/tls_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 70757098f79b..06688edb6855 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -364,8 +364,10 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval,
 	crypto_info = &ctx->crypto_send;
 
 	/* Currently we don't support set crypto info more than one time */
-	if (TLS_CRYPTO_INFO_READY(crypto_info))
+	if (TLS_CRYPTO_INFO_READY(crypto_info)) {
+		rc = -EBUSY;
 		goto out;
+	}
 
 	switch (tmp_crypto_info.cipher_type) {
 	case TLS_CIPHER_AES_GCM_128: {
-- 
2.14.3


From 0333795e467e646583d48419e1656515d40a5c89 Mon Sep 17 00:00:00 2001
From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 16 Jan 2018 16:04:28 +0100
Subject: [PATCH 35/37] tls: reset crypto_info when do_tls_setsockopt_tx fails

[ Upstream commit 6db959c82eb039a151d95a0f8b7dea643657327a ]

The current code copies directly from userspace to ctx->crypto_send, but
doesn't always reinitialize it to 0 on failure. This causes any
subsequent attempt to use this setsockopt to fail because of the
TLS_CRYPTO_INFO_READY check, eventhough crypto_info is not actually
ready.

This should result in a correctly set up socket after the 3rd call, but
currently it does not:

    size_t s = sizeof(struct tls12_crypto_info_aes_gcm_128);
    struct tls12_crypto_info_aes_gcm_128 crypto_good = {
        .info.version = TLS_1_2_VERSION,
        .info.cipher_type = TLS_CIPHER_AES_GCM_128,
    };

    struct tls12_crypto_info_aes_gcm_128 crypto_bad_type = crypto_good;
    crypto_bad_type.info.cipher_type = 42;

    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_bad_type, s);
    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s - 1);
    setsockopt(sock, SOL_TLS, TLS_TX, &crypto_good, s);

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/tls/tls_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 06688edb6855..282361ac0263 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -373,7 +373,7 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval,
 	case TLS_CIPHER_AES_GCM_128: {
 		if (optlen != sizeof(struct tls12_crypto_info_aes_gcm_128)) {
 			rc = -EINVAL;
-			goto out;
+			goto err_crypto_info;
 		}
 		rc = copy_from_user(
 		  crypto_info,
@@ -388,7 +388,7 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval,
 	}
 	default:
 		rc = -EINVAL;
-		goto out;
+		goto err_crypto_info;
 	}
 
 	ctx->sk_write_space = sk->sk_write_space;
-- 
2.14.3


From 5b0bcb20d4e0b7b383a1c801c651259924b5e809 Mon Sep 17 00:00:00 2001
From: Lorenzo Colitti <lorenzo@google.com>
Date: Thu, 11 Jan 2018 18:36:26 +0900
Subject: [PATCH 36/37] net: ipv4: Make "ip route get" match iif lo rules
 again.

[ Upstream commit 6503a30440962f1e1ccb8868816b4e18201218d4 ]

Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
versions of route lookup") broke "ip route get" in the presence
of rules that specify iif lo.

Host-originated traffic always has iif lo, because
ip_route_output_key_hash and ip6_route_output_flags set the flow
iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
convenient way to select only originated traffic and not forwarded
traffic.

inet_rtm_getroute used to match these rules correctly because
even though it sets the flow iif to 0, it called
ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
But now that it calls ip_route_output_key_hash_rcu, the ifindex
will remain 0 and not match the iif lo in the rule. As a result,
"ip route get" will return ENETUNREACH.

Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 804bead564db..0ba88efca7ad 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2762,6 +2762,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 		if (err == 0 && rt->dst.error)
 			err = -rt->dst.error;
 	} else {
+		fl4.flowi4_iif = LOOPBACK_IFINDEX;
 		rt = ip_route_output_key_hash_rcu(net, &fl4, &res, skb);
 		err = 0;
 		if (IS_ERR(rt))
-- 
2.14.3


From f7c7dfeeb221bcd4a731ad6a1f50dd24db3a3412 Mon Sep 17 00:00:00 2001
From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 22 Jan 2018 16:06:37 -0500
Subject: [PATCH 37/37] vmxnet3: repair memory leak

[ Upstream commit 848b159835ddef99cc4193083f7e786c3992f580 ]

with the introduction of commit
b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info
is improperly handled.  While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when  the mtu of a vmxnet3 interface is changed
frequently.

Fix is pretty straight forward, just move the NULL set to after the
free.

Tested by myself with successful results

Applies to net, and should likely be queued for stable, please

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-By: boyang@redhat.com
CC: boyang@redhat.com
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: David S. Miller <davem@davemloft.net>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index d1c7029ded7c..cf95290b160c 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1616,7 +1616,6 @@ static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
 					  rq->rx_ring[i].basePA);
 			rq->rx_ring[i].base = NULL;
 		}
-		rq->buf_info[i] = NULL;
 	}
 
 	if (rq->data_ring.base) {
@@ -1638,6 +1637,7 @@ static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
 			(rq->rx_ring[0].size + rq->rx_ring[1].size);
 		dma_free_coherent(&adapter->pdev->dev, sz, rq->buf_info[0],
 				  rq->buf_info_pa);
+		rq->buf_info[0] = rq->buf_info[1] = NULL;
 	}
 }
 
-- 
2.14.3