All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial()
@ 2016-11-24  9:17 Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] tcp: fix overflow in __tcp_retransmit_skb() Jiri Slaby
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Eric Dumazet, David S . Miller, Jiri Slaby

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 1a24e04e4b50939daa3041682b38b82c896ca438 upstream.

sk_mem_reclaim_partial() goal is to ensure each socket has
one SK_MEM_QUANTUM forward allocation. This is needed both for
performance and better handling of memory pressure situations in
follow up patches.

SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes
as some arches have 64KB pages.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/net/sock.h | 6 +++---
 net/core/sock.c    | 9 +++++----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6ed6df149bce..cd6626f99ba3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1380,7 +1380,7 @@ static inline struct inode *SOCK_INODE(struct socket *socket)
  * Functions for memory accounting
  */
 extern int __sk_mem_schedule(struct sock *sk, int size, int kind);
-extern void __sk_mem_reclaim(struct sock *sk);
+void __sk_mem_reclaim(struct sock *sk, int amount);
 
 #define SK_MEM_QUANTUM ((int)PAGE_SIZE)
 #define SK_MEM_QUANTUM_SHIFT ilog2(SK_MEM_QUANTUM)
@@ -1421,7 +1421,7 @@ static inline void sk_mem_reclaim(struct sock *sk)
 	if (!sk_has_account(sk))
 		return;
 	if (sk->sk_forward_alloc >= SK_MEM_QUANTUM)
-		__sk_mem_reclaim(sk);
+		__sk_mem_reclaim(sk, sk->sk_forward_alloc);
 }
 
 static inline void sk_mem_reclaim_partial(struct sock *sk)
@@ -1429,7 +1429,7 @@ static inline void sk_mem_reclaim_partial(struct sock *sk)
 	if (!sk_has_account(sk))
 		return;
 	if (sk->sk_forward_alloc > SK_MEM_QUANTUM)
-		__sk_mem_reclaim(sk);
+		__sk_mem_reclaim(sk, sk->sk_forward_alloc - 1);
 }
 
 static inline void sk_mem_charge(struct sock *sk, int size)
diff --git a/net/core/sock.c b/net/core/sock.c
index 4ac4c13352ab..516b45c82093 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2095,12 +2095,13 @@ EXPORT_SYMBOL(__sk_mem_schedule);
 /**
  *	__sk_reclaim - reclaim memory_allocated
  *	@sk: socket
+ *	@amount: number of bytes (rounded down to a SK_MEM_QUANTUM multiple)
  */
-void __sk_mem_reclaim(struct sock *sk)
+void __sk_mem_reclaim(struct sock *sk, int amount)
 {
-	sk_memory_allocated_sub(sk,
-				sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT);
-	sk->sk_forward_alloc &= SK_MEM_QUANTUM - 1;
+	amount >>= SK_MEM_QUANTUM_SHIFT;
+	sk_memory_allocated_sub(sk, amount);
+	sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT;
 
 	if (sk_under_memory_pressure(sk) &&
 	    (sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] tcp: fix overflow in __tcp_retransmit_skb()
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] net: avoid sk_forward_alloc overflows Jiri Slaby
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Eric Dumazet, David S . Miller, Jiri Slaby

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit ffb4d6c8508657824bcef68a36b2a0f9d8c09d10 ]

If a TCP socket gets a large write queue, an overflow can happen
in a test in __tcp_retransmit_skb() preventing all retransmits.

The flow then stalls and resets after timeouts.

Tested:

sysctl -w net.core.wmem_max=1000000000
netperf -H dest -- -s 1000000000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/ipv4/tcp_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index aa72c9d604a0..f08921156be8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2336,7 +2336,8 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	 * copying overhead: fragmentation, tunneling, mangling etc.
 	 */
 	if (atomic_read(&sk->sk_wmem_alloc) >
-	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
+	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
+		  sk->sk_sndbuf))
 		return -EAGAIN;
 
 	if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] net: avoid sk_forward_alloc overflows
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] tcp: fix overflow in __tcp_retransmit_skb() Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] tcp: fix wrong checksum calculation on MTU probing Jiri Slaby
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Eric Dumazet, David S . Miller, Jiri Slaby

From: Eric Dumazet <edumazet@google.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d ]

A malicious TCP receiver, sending SACK, can force the sender to split
skbs in write queue and increase its memory usage.

Then, when socket is closed and its write queue purged, we might
overflow sk_forward_alloc (It becomes negative)

sk_mem_reclaim() does nothing in this case, and more than 2GB
are leaked from TCP perspective (tcp_memory_allocated is not changed)

Then warnings trigger from inet_sock_destruct() and
sk_stream_kill_queues() seeing a not zero sk_forward_alloc

All TCP stack can be stuck because TCP is under memory pressure.

A simple fix is to preemptively reclaim from sk_mem_uncharge().

This makes sure a socket wont have more than 2 MB forward allocated,
after burst and idle period.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/net/sock.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index cd6626f99ba3..238e934dd3c3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1444,6 +1444,16 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
 	if (!sk_has_account(sk))
 		return;
 	sk->sk_forward_alloc += size;
+
+	/* Avoid a possible overflow.
+	 * TCP send queues can make this happen, if sk_mem_reclaim()
+	 * is not called and more than 2 GBytes are released at once.
+	 *
+	 * If we reach 2 MBytes, reclaim 1 MBytes right now, there is
+	 * no need to hold that much forward allocation anyway.
+	 */
+	if (unlikely(sk->sk_forward_alloc >= 1 << 21))
+		__sk_mem_reclaim(sk, 1 << 20);
 }
 
 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] tcp: fix wrong checksum calculation on MTU probing
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] tcp: fix overflow in __tcp_retransmit_skb() Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] net: avoid sk_forward_alloc overflows Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() Jiri Slaby
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Douglas Caetano dos Santos, David S . Miller, Jiri Slaby

From: Douglas Caetano dos Santos <douglascs@taghos.com.br>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 2fe664f1fcf7c4da6891f95708a7a56d3c024354 ]

With TCP MTU probing enabled and offload TX checksumming disabled,
tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
into the probe's SKB had an odd length. This was caused by the direct use
of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
fragment being copied, if needed. When this fragment was not the last, a
subsequent call used the previous checksum without considering this
padding.

The effect was a stale connection in one way, as even retransmissions
wouldn't solve the problem, because the checksum was never recalculated for
the full SKB length.

Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/ipv4/tcp_output.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f08921156be8..c807d5790ca1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1762,12 +1762,14 @@ static int tcp_mtu_probe(struct sock *sk)
 	len = 0;
 	tcp_for_write_queue_from_safe(skb, next, sk) {
 		copy = min_t(int, skb->len, probe_size - len);
-		if (nskb->ip_summed)
+		if (nskb->ip_summed) {
 			skb_copy_bits(skb, 0, skb_put(nskb, copy), copy);
-		else
-			nskb->csum = skb_copy_and_csum_bits(skb, 0,
-							    skb_put(nskb, copy),
-							    copy, nskb->csum);
+		} else {
+			__wsum csum = skb_copy_and_csum_bits(skb, 0,
+							     skb_put(nskb, copy),
+							     copy, 0);
+			nskb->csum = csum_block_add(nskb->csum, csum, len);
+		}
 
 		if (skb->len <= copy) {
 			/* We've eaten all the data from this skb.
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
                   ` (2 preceding siblings ...)
  2016-11-24  9:17 ` [patch added to 3.12-stable] tcp: fix wrong checksum calculation on MTU probing Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route Jiri Slaby
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Lance Richardson, David S . Miller, Jiri Slaby

From: Lance Richardson <lrichard@redhat.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit db32e4e49ce2b0e5fcc17803d011a401c0a637f6 ]

Similar to commit 3be07244b733 ("ip6_gre: fix flowi6_proto value in
xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup.

Up until now, ip6gre_xmit_other() has set flowi6_proto to a bogus value.
This affected output route lookup for packets sent on an ip6gretap device
in cases where routing was dependent on the value of flowi6_proto.

Since the correct proto is already set in the tunnel flowi6 template via
commit 252f3f5a1189 ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit
path."), simply delete the line setting the incorrect flowi6_proto value.

Suggested-by: Jiri Benc <jbenc@redhat.com>
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/ipv6/ip6_gre.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 737af492ed75..6b5acd50103f 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -895,7 +895,6 @@ static int ip6gre_xmit_other(struct sk_buff *skb, struct net_device *dev)
 		encap_limit = t->parms.encap_limit;
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
-	fl6.flowi6_proto = skb->protocol;
 
 	err = ip6gre_xmit2(skb, dev, 0, &fl6, encap_limit, &mtu);
 
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
                   ` (3 preceding siblings ...)
  2016-11-24  9:17 ` [patch added to 3.12-stable] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] net: Add netdev all_adj_list refcnt propagation to fix panic Jiri Slaby
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Nikolay Aleksandrov, David S . Miller, Jiri Slaby

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8 ]

Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
instead of the previous dst_pid which was copied from in_skb's portid.
Since the skb is new the portid is 0 at that point so the packets are sent
to the kernel and we get scheduling while atomic or a deadlock (depending
on where it happens) by trying to acquire rtnl two times.
Also since this is RTM_GETROUTE, it can be triggered by a normal user.

Here's the sleeping while atomic trace:
[ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 7858.212881] 2 locks held by swapper/0/0:
[ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
[ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
[ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
[ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
[ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
[ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
[ 7858.215251] Call Trace:
[ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
[ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
[ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
[ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
[ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
[ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
[ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
[ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
[ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
[ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
[ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
[ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
[ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
[ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
[ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
[ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
[ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
[ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
[ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
[ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
[ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
[ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
[ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
[ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
[ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
[ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
[ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
[ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
[ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a

Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/linux/mroute.h  | 2 +-
 include/linux/mroute6.h | 2 +-
 net/ipv4/ipmr.c         | 3 ++-
 net/ipv4/route.c        | 3 ++-
 net/ipv6/ip6mr.c        | 5 +++--
 net/ipv6/route.c        | 4 +++-
 6 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index 79aaa9fc1a15..d5277fc3ce2e 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -103,5 +103,5 @@ struct mfc_cache {
 struct rtmsg;
 extern int ipmr_get_route(struct net *net, struct sk_buff *skb,
 			  __be32 saddr, __be32 daddr,
-			  struct rtmsg *rtm, int nowait);
+			  struct rtmsg *rtm, int nowait, u32 portid);
 #endif
diff --git a/include/linux/mroute6.h b/include/linux/mroute6.h
index 66982e764051..f831155dc7d1 100644
--- a/include/linux/mroute6.h
+++ b/include/linux/mroute6.h
@@ -115,7 +115,7 @@ struct mfc6_cache {
 
 struct rtmsg;
 extern int ip6mr_get_route(struct net *net, struct sk_buff *skb,
-			   struct rtmsg *rtm, int nowait);
+			   struct rtmsg *rtm, int nowait, u32 portid);
 
 #ifdef CONFIG_IPV6_MROUTE
 extern struct sock *mroute6_socket(struct net *net, struct sk_buff *skb);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index dccda72bac62..5643a10da91d 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2188,7 +2188,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 
 int ipmr_get_route(struct net *net, struct sk_buff *skb,
 		   __be32 saddr, __be32 daddr,
-		   struct rtmsg *rtm, int nowait)
+		   struct rtmsg *rtm, int nowait, u32 portid)
 {
 	struct mfc_cache *cache;
 	struct mr_table *mrt;
@@ -2233,6 +2233,7 @@ int ipmr_get_route(struct net *net, struct sk_buff *skb,
 			return -ENOMEM;
 		}
 
+		NETLINK_CB(skb2).portid = portid;
 		skb_push(skb2, sizeof(struct iphdr));
 		skb_reset_network_header(skb2);
 		iph = ip_hdr(skb2);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 1454176792b3..2d709773dc6c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2427,7 +2427,8 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src,
 		    IPV4_DEVCONF_ALL(net, MC_FORWARDING)) {
 			int err = ipmr_get_route(net, skb,
 						 fl4->saddr, fl4->daddr,
-						 r, nowait);
+						 r, nowait, portid);
+
 			if (err <= 0) {
 				if (!nowait) {
 					if (err == 0)
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 86d30e60242a..56aa540d77f6 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2273,8 +2273,8 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb,
 	return 1;
 }
 
-int ip6mr_get_route(struct net *net,
-		    struct sk_buff *skb, struct rtmsg *rtm, int nowait)
+int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm,
+		    int nowait, u32 portid)
 {
 	int err;
 	struct mr6_table *mrt;
@@ -2319,6 +2319,7 @@ int ip6mr_get_route(struct net *net,
 			return -ENOMEM;
 		}
 
+		NETLINK_CB(skb2).portid = portid;
 		skb_reset_transport_header(skb2);
 
 		skb_put(skb2, sizeof(struct ipv6hdr));
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f862c7688c99..e19817a090c7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2614,7 +2614,9 @@ static int rt6_fill_node(struct net *net,
 	if (iif) {
 #ifdef CONFIG_IPV6_MROUTE
 		if (ipv6_addr_is_multicast(&rt->rt6i_dst.addr)) {
-			int err = ip6mr_get_route(net, skb, rtm, nowait);
+			int err = ip6mr_get_route(net, skb, rtm, nowait,
+						  portid);
+
 			if (err <= 0) {
 				if (!nowait) {
 					if (err == 0)
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] net: Add netdev all_adj_list refcnt propagation to fix panic
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
                   ` (4 preceding siblings ...)
  2016-11-24  9:17 ` [patch added to 3.12-stable] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] packet: call fanout_release, while UNREGISTERING a netdev Jiri Slaby
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Andrew Collins, David S . Miller, Jiri Slaby

From: Andrew Collins <acollins@cradlepoint.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 93409033ae653f1c9a949202fb537ab095b2092f ]

This is a respin of a patch to fix a relatively easily reproducible kernel
panic related to the all_adj_list handling for netdevs in recent kernels.

The following sequence of commands will reproduce the issue:

ip link add link eth0 name eth0.100 type vlan id 100
ip link add link eth0 name eth0.200 type vlan id 200
ip link add name testbr type bridge
ip link set eth0.100 master testbr
ip link set eth0.200 master testbr
ip link add link testbr mac0 type macvlan
ip link delete dev testbr

This creates an upper/lower tree of (excuse the poor ASCII art):

            /---eth0.100-eth0
mac0-testbr-
            \---eth0.200-eth0

When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
reference to eth0 is added, so this results in a panic.

This change adds reference count propagation so things are handled properly.

Matthias Schiffer reported a similar crash in batman-adv:

https://github.com/freifunk-gluon/gluon/issues/680
https://www.open-mesh.org/issues/247

which this patch also seems to resolve.

Signed-off-by: Andrew Collins <acollins@cradlepoint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/core/dev.c | 76 +++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 43 insertions(+), 33 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index d30c12263f38..b3788eb33ce4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4546,6 +4546,7 @@ EXPORT_SYMBOL(netdev_master_upper_dev_get_rcu);
 
 static int __netdev_adjacent_dev_insert(struct net_device *dev,
 					struct net_device *adj_dev,
+					u16 ref_nr,
 					bool neighbour, bool master,
 					bool upper)
 {
@@ -4555,7 +4556,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 
 	if (adj) {
 		BUG_ON(neighbour);
-		adj->ref_nr++;
+		adj->ref_nr += ref_nr;
 		return 0;
 	}
 
@@ -4566,7 +4567,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 	adj->dev = adj_dev;
 	adj->master = master;
 	adj->neighbour = neighbour;
-	adj->ref_nr = 1;
+	adj->ref_nr = ref_nr;
 
 	dev_hold(adj_dev);
 	pr_debug("dev_hold for %s, because of %s link added from %s to %s\n",
@@ -4589,22 +4590,25 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 
 static inline int __netdev_upper_dev_insert(struct net_device *dev,
 					    struct net_device *udev,
+					    u16 ref_nr,
 					    bool master, bool neighbour)
 {
-	return __netdev_adjacent_dev_insert(dev, udev, neighbour, master,
-					    true);
+	return __netdev_adjacent_dev_insert(dev, udev, ref_nr, neighbour,
+					    master, true);
 }
 
 static inline int __netdev_lower_dev_insert(struct net_device *dev,
 					    struct net_device *ldev,
+					    u16 ref_nr,
 					    bool neighbour)
 {
-	return __netdev_adjacent_dev_insert(dev, ldev, neighbour, false,
+	return __netdev_adjacent_dev_insert(dev, ldev, ref_nr, neighbour, false,
 					    false);
 }
 
 void __netdev_adjacent_dev_remove(struct net_device *dev,
-				  struct net_device *adj_dev, bool upper)
+				  struct net_device *adj_dev, u16 ref_nr,
+				  bool upper)
 {
 	struct netdev_adjacent *adj;
 
@@ -4616,8 +4620,8 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
 	if (!adj)
 		BUG();
 
-	if (adj->ref_nr > 1) {
-		adj->ref_nr--;
+	if (adj->ref_nr > ref_nr) {
+		adj->ref_nr -= ref_nr;
 		return;
 	}
 
@@ -4630,30 +4634,33 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
 }
 
 static inline void __netdev_upper_dev_remove(struct net_device *dev,
-					     struct net_device *udev)
+					     struct net_device *udev,
+					     u16 ref_nr)
 {
-	return __netdev_adjacent_dev_remove(dev, udev, true);
+	return __netdev_adjacent_dev_remove(dev, udev, ref_nr, true);
 }
 
 static inline void __netdev_lower_dev_remove(struct net_device *dev,
-					     struct net_device *ldev)
+					     struct net_device *ldev,
+					     u16 ref_nr)
 {
-	return __netdev_adjacent_dev_remove(dev, ldev, false);
+	return __netdev_adjacent_dev_remove(dev, ldev, ref_nr, false);
 }
 
 int __netdev_adjacent_dev_insert_link(struct net_device *dev,
 				      struct net_device *upper_dev,
-				      bool master, bool neighbour)
+				      u16 ref_nr, bool master, bool neighbour)
 {
 	int ret;
 
-	ret = __netdev_upper_dev_insert(dev, upper_dev, master, neighbour);
+	ret = __netdev_upper_dev_insert(dev, upper_dev, ref_nr, master,
+			neighbour);
 	if (ret)
 		return ret;
 
-	ret = __netdev_lower_dev_insert(upper_dev, dev, neighbour);
+	ret = __netdev_lower_dev_insert(upper_dev, dev, ref_nr, neighbour);
 	if (ret) {
-		__netdev_upper_dev_remove(dev, upper_dev);
+		__netdev_upper_dev_remove(dev, upper_dev, ref_nr);
 		return ret;
 	}
 
@@ -4661,23 +4668,25 @@ int __netdev_adjacent_dev_insert_link(struct net_device *dev,
 }
 
 static inline int __netdev_adjacent_dev_link(struct net_device *dev,
-					     struct net_device *udev)
+					     struct net_device *udev,
+					     u16 ref_nr)
 {
-	return __netdev_adjacent_dev_insert_link(dev, udev, false, false);
+	return __netdev_adjacent_dev_insert_link(dev, udev, ref_nr, false,
+						false);
 }
 
 static inline int __netdev_adjacent_dev_link_neighbour(struct net_device *dev,
 						       struct net_device *udev,
 						       bool master)
 {
-	return __netdev_adjacent_dev_insert_link(dev, udev, master, true);
+	return __netdev_adjacent_dev_insert_link(dev, udev, 1, master, true);
 }
 
 void __netdev_adjacent_dev_unlink(struct net_device *dev,
-				  struct net_device *upper_dev)
+				  struct net_device *upper_dev, u16 ref_nr)
 {
-	__netdev_upper_dev_remove(dev, upper_dev);
-	__netdev_lower_dev_remove(upper_dev, dev);
+	__netdev_upper_dev_remove(dev, upper_dev, ref_nr);
+	__netdev_lower_dev_remove(upper_dev, dev, ref_nr);
 }
 
 
@@ -4713,7 +4722,8 @@ static int __netdev_upper_dev_link(struct net_device *dev,
 	 */
 	list_for_each_entry(i, &dev->lower_dev_list, list) {
 		list_for_each_entry(j, &upper_dev->upper_dev_list, list) {
-			ret = __netdev_adjacent_dev_link(i->dev, j->dev);
+			ret = __netdev_adjacent_dev_link(i->dev, j->dev,
+					i->ref_nr);
 			if (ret)
 				goto rollback_mesh;
 		}
@@ -4721,14 +4731,14 @@ static int __netdev_upper_dev_link(struct net_device *dev,
 
 	/* add dev to every upper_dev's upper device */
 	list_for_each_entry(i, &upper_dev->upper_dev_list, list) {
-		ret = __netdev_adjacent_dev_link(dev, i->dev);
+		ret = __netdev_adjacent_dev_link(dev, i->dev, i->ref_nr);
 		if (ret)
 			goto rollback_upper_mesh;
 	}
 
 	/* add upper_dev to every dev's lower device */
 	list_for_each_entry(i, &dev->lower_dev_list, list) {
-		ret = __netdev_adjacent_dev_link(i->dev, upper_dev);
+		ret = __netdev_adjacent_dev_link(i->dev, upper_dev, i->ref_nr);
 		if (ret)
 			goto rollback_lower_mesh;
 	}
@@ -4741,7 +4751,7 @@ rollback_lower_mesh:
 	list_for_each_entry(i, &dev->lower_dev_list, list) {
 		if (i == to_i)
 			break;
-		__netdev_adjacent_dev_unlink(i->dev, upper_dev);
+		__netdev_adjacent_dev_unlink(i->dev, upper_dev, i->ref_nr);
 	}
 
 	i = NULL;
@@ -4751,7 +4761,7 @@ rollback_upper_mesh:
 	list_for_each_entry(i, &upper_dev->upper_dev_list, list) {
 		if (i == to_i)
 			break;
-		__netdev_adjacent_dev_unlink(dev, i->dev);
+		__netdev_adjacent_dev_unlink(dev, i->dev, i->ref_nr);
 	}
 
 	i = j = NULL;
@@ -4763,13 +4773,13 @@ rollback_mesh:
 		list_for_each_entry(j, &upper_dev->upper_dev_list, list) {
 			if (i == to_i && j == to_j)
 				break;
-			__netdev_adjacent_dev_unlink(i->dev, j->dev);
+			__netdev_adjacent_dev_unlink(i->dev, j->dev, i->ref_nr);
 		}
 		if (i == to_i)
 			break;
 	}
 
-	__netdev_adjacent_dev_unlink(dev, upper_dev);
+	__netdev_adjacent_dev_unlink(dev, upper_dev, 1);
 
 	return ret;
 }
@@ -4823,7 +4833,7 @@ void netdev_upper_dev_unlink(struct net_device *dev,
 	struct netdev_adjacent *i, *j;
 	ASSERT_RTNL();
 
-	__netdev_adjacent_dev_unlink(dev, upper_dev);
+	__netdev_adjacent_dev_unlink(dev, upper_dev, 1);
 
 	/* Here is the tricky part. We must remove all dev's lower
 	 * devices from all upper_dev's upper devices and vice
@@ -4831,16 +4841,16 @@ void netdev_upper_dev_unlink(struct net_device *dev,
 	 */
 	list_for_each_entry(i, &dev->lower_dev_list, list)
 		list_for_each_entry(j, &upper_dev->upper_dev_list, list)
-			__netdev_adjacent_dev_unlink(i->dev, j->dev);
+			__netdev_adjacent_dev_unlink(i->dev, j->dev, i->ref_nr);
 
 	/* remove also the devices itself from lower/upper device
 	 * list
 	 */
 	list_for_each_entry(i, &dev->lower_dev_list, list)
-		__netdev_adjacent_dev_unlink(i->dev, upper_dev);
+		__netdev_adjacent_dev_unlink(i->dev, upper_dev, i->ref_nr);
 
 	list_for_each_entry(i, &upper_dev->upper_dev_list, list)
-		__netdev_adjacent_dev_unlink(dev, i->dev);
+		__netdev_adjacent_dev_unlink(dev, i->dev, i->ref_nr);
 
 	call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
 }
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] packet: call fanout_release, while UNREGISTERING a netdev
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
                   ` (5 preceding siblings ...)
  2016-11-24  9:17 ` [patch added to 3.12-stable] net: Add netdev all_adj_list refcnt propagation to fix panic Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] ipv6: correctly add local routes when lo goes up Jiri Slaby
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Anoob Soman, David S . Miller, Jiri Slaby

From: Anoob Soman <anoob.soman@citrix.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 6664498280cf17a59c3e7cf1a931444c02633ed1 ]

If a socket has FANOUT sockopt set, a new proto_hook is registered
as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
registered as part of fanout_add is not removed. Call fanout_release, on a
NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
fanout_list.

This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()

Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/packet/af_packet.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1e9cb9921daa..3f9804b2802a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3365,6 +3365,7 @@ static int packet_notifier(struct notifier_block *this,
 				}
 				if (msg == NETDEV_UNREGISTER) {
 					packet_cached_dev_reset(po);
+					fanout_release(sk);
 					po->ifindex = -1;
 					if (po->prot_hook.dev)
 						dev_put(po->prot_hook.dev);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] ipv6: correctly add local routes when lo goes up
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
                   ` (6 preceding siblings ...)
  2016-11-24  9:17 ` [patch added to 3.12-stable] packet: call fanout_release, while UNREGISTERING a netdev Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] bridge: multicast: restore perm router ports on multicast enable Jiri Slaby
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable
  Cc: Nicolas Dichtel, Balakumaran Kannan, Maruthi Thotad,
	Sabrina Dubroca, Hannes Frederic Sowa, Weilong Chen, Gao feng,
	David S . Miller, Jiri Slaby

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit a220445f9f4382c36a53d8ef3e08165fa27f7e2c ]

The goal of the patch is to fix this scenario:
 ip link add dummy1 type dummy
 ip link set dummy1 up
 ip link set lo down ; ip link set lo up

After that sequence, the local route to the link layer address of dummy1 is
not there anymore.

When the loopback is set down, all local routes are deleted by
addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
exists, because the corresponding idev has a reference on it. After the rcu
grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
set obsolete to DST_OBSOLETE_DEAD.

In this case, init_loopback() is called before dst_rcu_free(), thus
obsolete is still sets to something <= 0. So, the function doesn't add the
route again. To avoid that race, let's check the rt6 refcnt instead.

Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo")
Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo device up")
Reported-by: Francesco Santoro <francesco.santoro@6wind.com>
Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com>
CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Weilong Chen <chenweilong@huawei.com>
CC: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/ipv6/addrconf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index bbf35875e4ef..1e31fc5477e8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2648,7 +2648,7 @@ static void init_loopback(struct net_device *dev)
 				 * lo device down, release this obsolete dst and
 				 * reallocate a new router for ifa.
 				 */
-				if (sp_ifa->rt->dst.obsolete > 0) {
+				if (!atomic_read(&sp_ifa->rt->rt6i_ref)) {
 					ip6_rt_put(sp_ifa->rt);
 					sp_ifa->rt = NULL;
 				} else {
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] bridge: multicast: restore perm router ports on multicast enable
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
                   ` (7 preceding siblings ...)
  2016-11-24  9:17 ` [patch added to 3.12-stable] ipv6: correctly add local routes when lo goes up Jiri Slaby
@ 2016-11-24  9:17 ` Jiri Slaby
  2016-11-24  9:17   ` Jiri Slaby
  2016-11-24  9:18 ` [patch added to 3.12-stable] sctp: validate chunk len before actually using it Jiri Slaby
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable; +Cc: Nikolay Aleksandrov, David S . Miller, Jiri Slaby

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit 7cb3f9214dfa443c1ccc2be637dcc6344cc203f0 ]

Satish reported a problem with the perm multicast router ports not getting
reenabled after some series of events, in particular if it happens that the
multicast snooping has been disabled and the port goes to disabled state
then it will be deleted from the router port list, but if it moves into
non-disabled state it will not be re-added because the mcast snooping is
still disabled, and enabling snooping later does nothing.

Here are the steps to reproduce, setup br0 with snooping enabled and eth1
added as a perm router (multicast_router = 2):
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
<empty>

At this point we have mcast enabled and eth1 as a perm router (value = 2)
but it is not in the router list which is incorrect.

After this change:
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
router ports on br0: eth1

Note: we can directly do br_multicast_enable_port for all because the
querier timer already has checks for the port state and will simply
expire if it's in blocking/disabled. See the comment added by
commit 9aa66382163e7 ("bridge: multicast: add a comment to
br_port_state_selection about blocking state")

Fixes: 561f1103a2b7 ("bridge: Add multicast_snooping sysfs toggle")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/bridge/br_multicast.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 91fed8147c39..edb0eee5caf7 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -911,20 +911,25 @@ static void br_multicast_enable(struct bridge_mcast_query *query)
 		mod_timer(&query->timer, jiffies);
 }
 
-void br_multicast_enable_port(struct net_bridge_port *port)
+static void __br_multicast_enable_port(struct net_bridge_port *port)
 {
 	struct net_bridge *br = port->br;
 
-	spin_lock(&br->multicast_lock);
 	if (br->multicast_disabled || !netif_running(br->dev))
-		goto out;
+		return;
 
 	br_multicast_enable(&port->ip4_query);
 #if IS_ENABLED(CONFIG_IPV6)
 	br_multicast_enable(&port->ip6_query);
 #endif
+}
 
-out:
+void br_multicast_enable_port(struct net_bridge_port *port)
+{
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	__br_multicast_enable_port(port);
 	spin_unlock(&br->multicast_lock);
 }
 
@@ -1954,8 +1959,9 @@ static void br_multicast_start_querier(struct net_bridge *br,
 
 int br_multicast_toggle(struct net_bridge *br, unsigned long val)
 {
-	int err = 0;
 	struct net_bridge_mdb_htable *mdb;
+	struct net_bridge_port *port;
+	int err = 0;
 
 	spin_lock_bh(&br->multicast_lock);
 	if (br->multicast_disabled == !val)
@@ -1983,10 +1989,9 @@ rollback:
 			goto rollback;
 	}
 
-	br_multicast_start_querier(br, &br->ip4_query);
-#if IS_ENABLED(CONFIG_IPV6)
-	br_multicast_start_querier(br, &br->ip6_query);
-#endif
+	br_multicast_open(br);
+	list_for_each_entry(port, &br->port_list, list)
+		__br_multicast_enable_port(port);
 
 unlock:
 	spin_unlock_bh(&br->multicast_lock);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] net: sctp, forbid negative length
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
@ 2016-11-24  9:17   ` Jiri Slaby
  2016-11-24  9:17 ` [patch added to 3.12-stable] net: avoid sk_forward_alloc overflows Jiri Slaby
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable
  Cc: Jiri Slaby, Vlad Yasevich, Neil Horman, David S. Miller,
	linux-sctp, netdev

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit a4b8e71b05c27bae6bad3bdecddbc6b68a3ad8cf ]

Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
        if (len < sizeof(int))
                return -EINVAL;

On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.

Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.

Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.

If not checked, this is the result:
UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
shift exponent 52 is too large for 32-bit type 'int'
CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
Call Trace:
 [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
...
 [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
 [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
 [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/sctp/socket.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ead3a8adca08..98cd6606f4a4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4247,7 +4247,7 @@ static int sctp_getsockopt_disable_fragments(struct sock *sk, int len,
 static int sctp_getsockopt_events(struct sock *sk, int len, char __user *optval,
 				  int __user *optlen)
 {
-	if (len <= 0)
+	if (len == 0)
 		return -EINVAL;
 	if (len > sizeof(struct sctp_event_subscribe))
 		len = sizeof(struct sctp_event_subscribe);
@@ -5758,6 +5758,9 @@ static int sctp_getsockopt(struct sock *sk, int level, int optname,
 	if (get_user(len, optlen))
 		return -EFAULT;
 
+	if (len < 0)
+		return -EINVAL;
+
 	sctp_lock_sock(sk);
 
 	switch (optname) {
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] net: sctp, forbid negative length
@ 2016-11-24  9:17   ` Jiri Slaby
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:17 UTC (permalink / raw)
  To: stable
  Cc: Jiri Slaby, Vlad Yasevich, Neil Horman, David S. Miller,
	linux-sctp, netdev

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

=======
[ Upstream commit a4b8e71b05c27bae6bad3bdecddbc6b68a3ad8cf ]

Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
        if (len < sizeof(int))
                return -EINVAL;

On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.

Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.

Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.

If not checked, this is the result:
UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
shift exponent 52 is too large for 32-bit type 'int'
CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
Call Trace:
 [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
...
 [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
 [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
 [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/sctp/socket.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ead3a8adca08..98cd6606f4a4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4247,7 +4247,7 @@ static int sctp_getsockopt_disable_fragments(struct sock *sk, int len,
 static int sctp_getsockopt_events(struct sock *sk, int len, char __user *optval,
 				  int __user *optlen)
 {
-	if (len <= 0)
+	if (len = 0)
 		return -EINVAL;
 	if (len > sizeof(struct sctp_event_subscribe))
 		len = sizeof(struct sctp_event_subscribe);
@@ -5758,6 +5758,9 @@ static int sctp_getsockopt(struct sock *sk, int level, int optname,
 	if (get_user(len, optlen))
 		return -EFAULT;
 
+	if (len < 0)
+		return -EINVAL;
+
 	sctp_lock_sock(sk);
 
 	switch (optname) {
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [patch added to 3.12-stable] sctp: validate chunk len before actually using it
  2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
                   ` (9 preceding siblings ...)
  2016-11-24  9:17   ` Jiri Slaby
@ 2016-11-24  9:18 ` Jiri Slaby
  10 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-11-24  9:18 UTC (permalink / raw)
  To: stable; +Cc: Marcelo Ricardo Leitner, David S . Miller, Jiri Slaby

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit bf911e985d6bbaa328c20c3e05f4eb03de11fdd6 ]

Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.

The fix is to just move the check upwards so it's also validated for the
1st chunk.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 net/sctp/sm_statefuns.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 63a116c31a8b..ce6c8910f041 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3427,6 +3427,12 @@ sctp_disposition_t sctp_sf_ootb(struct net *net,
 			return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
 						  commands);
 
+		/* Report violation if chunk len overflows */
+		ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
+		if (ch_end > skb_tail_pointer(skb))
+			return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
+						  commands);
+
 		/* Now that we know we at least have a chunk header,
 		 * do things that are type appropriate.
 		 */
@@ -3458,12 +3464,6 @@ sctp_disposition_t sctp_sf_ootb(struct net *net,
 			}
 		}
 
-		/* Report violation if chunk len overflows */
-		ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
-		if (ch_end > skb_tail_pointer(skb))
-			return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
-						  commands);
-
 		ch = (sctp_chunkhdr_t *) ch_end;
 	} while (ch_end < skb_tail_pointer(skb));
 
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: [patch added to 3.12-stable] net: sctp, forbid negative length
  2016-11-24  9:17   ` Jiri Slaby
@ 2016-11-24 12:51     ` David Laight
  -1 siblings, 0 replies; 15+ messages in thread
From: David Laight @ 2016-11-24 12:51 UTC (permalink / raw)
  To: 'Jiri Slaby', stable
  Cc: Vlad Yasevich, Neil Horman, David S. Miller, linux-sctp, netdev

From: Jiri Slaby
> Sent: 24 November 2016 09:18
> This patch has been added to the 3.12 stable tree. If you have any
> objections, please let us know.
> 
> ===============
> 
> [ Upstream commit a4b8e71b05c27bae6bad3bdecddbc6b68a3ad8cf ]
> 
> Most of getsockopt handlers in net/sctp/socket.c check len against
> sizeof some structure like:
>         if (len < sizeof(int))
>                 return -EINVAL;
> 
> On the first look, the check seems to be correct. But since len is int
> and sizeof returns size_t, int gets promoted to unsigned size_t too. So
> the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
> false.

Would it be worth adding the check in the generic setsockopt/getsockopt system
call code instead of in each and every protocol?
(Clearly for net-next, not stable.)

	David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [patch added to 3.12-stable] net: sctp, forbid negative length
@ 2016-11-24 12:51     ` David Laight
  0 siblings, 0 replies; 15+ messages in thread
From: David Laight @ 2016-11-24 12:51 UTC (permalink / raw)
  To: 'Jiri Slaby', stable
  Cc: Vlad Yasevich, Neil Horman, David S. Miller, linux-sctp, netdev

From: Jiri Slaby
> Sent: 24 November 2016 09:18
> This patch has been added to the 3.12 stable tree. If you have any
> objections, please let us know.
> 
> =======> 
> [ Upstream commit a4b8e71b05c27bae6bad3bdecddbc6b68a3ad8cf ]
> 
> Most of getsockopt handlers in net/sctp/socket.c check len against
> sizeof some structure like:
>         if (len < sizeof(int))
>                 return -EINVAL;
> 
> On the first look, the check seems to be correct. But since len is int
> and sizeof returns size_t, int gets promoted to unsigned size_t too. So
> the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
> false.

Would it be worth adding the check in the generic setsockopt/getsockopt system
call code instead of in each and every protocol?
(Clearly for net-next, not stable.)

	David


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-24 12:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-24  9:17 [patch added to 3.12-stable] net: fix sk_mem_reclaim_partial() Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] tcp: fix overflow in __tcp_retransmit_skb() Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] net: avoid sk_forward_alloc overflows Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] tcp: fix wrong checksum calculation on MTU probing Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] net: Add netdev all_adj_list refcnt propagation to fix panic Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] packet: call fanout_release, while UNREGISTERING a netdev Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] ipv6: correctly add local routes when lo goes up Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] bridge: multicast: restore perm router ports on multicast enable Jiri Slaby
2016-11-24  9:17 ` [patch added to 3.12-stable] net: sctp, forbid negative length Jiri Slaby
2016-11-24  9:17   ` Jiri Slaby
2016-11-24 12:51   ` David Laight
2016-11-24 12:51     ` David Laight
2016-11-24  9:18 ` [patch added to 3.12-stable] sctp: validate chunk len before actually using it Jiri Slaby

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.