netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 net 0/2] net: fixes for device unregistration
@ 2015-07-09  6:59 Julian Anastasov
  2015-07-09  6:59 ` [PATCHv3 net 1/2] net: do not process device backlog during unregistration Julian Anastasov
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Julian Anastasov @ 2015-07-09  6:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Eric W. Biederman, Stephen Hemminger, Vittorio G (VittGam)

Test script from Eric W. Biederman can catch a problem
where packets from backlog are processed long after the last
synchronize_net call. This can be reproduced after few tests
if commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
is reverted for the test. Incoming packets do not hold
reference to device but even if they do, subsystems do not
expect packets to fly during and after the NETDEV_UNREGISTER
event.

The first fix has the cost of netif_running check in fast path.
The second fix calls rcu_read_lock while local IRQ is disabled,
I hope this is not against the rules.

Julian Anastasov (2):
  net: do not process device backlog during unregistration
  net: call rcu_read_lock early in process_backlog

 net/core/dev.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCHv3 net 1/2] net: do not process device backlog during unregistration
  2015-07-09  6:59 [PATCHv3 net 0/2] net: fixes for device unregistration Julian Anastasov
@ 2015-07-09  6:59 ` Julian Anastasov
  2015-07-09  6:59 ` [PATCHv3 net 2/2] net: call rcu_read_lock early in process_backlog Julian Anastasov
  2015-07-11  1:17 ` [PATCHv3 net 0/2] net: fixes for device unregistration David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Julian Anastasov @ 2015-07-09  6:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Eric W. Biederman, Stephen Hemminger, Vittorio G (VittGam)

commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). Above fix is still
required because inetdev_destroy can be called for other
reasons. But it shows the real problem: backlog can keep
packets for long time and they do not hold reference to
device. Such packets are then delivered to upper levels
at the same time when device is unregistered.
Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
accounts all packets from backlog but before that some packets
continue to be delivered to upper levels long after the
synchronize_net call which is supposed to wait the last
ones. Also, as Eric pointed out, processed packets, mostly
from other devices, can continue to add new packets to backlog.

Fix the problem by moving flush_backlog early, after the
device driver is stopped and before the synchronize_net() call.
Then use netif_running check to make sure we do not add more
packets to backlog. We have to do it in enqueue_to_backlog
context when the local IRQ is disabled. As result, after the
flush_backlog and synchronize_net sequence all packets
should be accounted.

Thanks to Eric W. Biederman for the test script and his
valuable feedback!

Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/core/dev.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1e57efd..6dd126a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3448,6 +3448,8 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
 	local_irq_save(flags);
 
 	rps_lock(sd);
+	if (!netif_running(skb->dev))
+		goto drop;
 	qlen = skb_queue_len(&sd->input_pkt_queue);
 	if (qlen <= netdev_max_backlog && !skb_flow_limit(skb, qlen)) {
 		if (qlen) {
@@ -3469,6 +3471,7 @@ enqueue:
 		goto enqueue;
 	}
 
+drop:
 	sd->dropped++;
 	rps_unlock(sd);
 
@@ -6135,6 +6138,7 @@ static void rollback_registered_many(struct list_head *head)
 		unlist_netdevice(dev);
 
 		dev->reg_state = NETREG_UNREGISTERING;
+		on_each_cpu(flush_backlog, dev, 1);
 	}
 
 	synchronize_net();
@@ -6770,8 +6774,6 @@ void netdev_run_todo(void)
 
 		dev->reg_state = NETREG_UNREGISTERED;
 
-		on_each_cpu(flush_backlog, dev, 1);
-
 		netdev_wait_allrefs(dev);
 
 		/* paranoia */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCHv3 net 2/2] net: call rcu_read_lock early in process_backlog
  2015-07-09  6:59 [PATCHv3 net 0/2] net: fixes for device unregistration Julian Anastasov
  2015-07-09  6:59 ` [PATCHv3 net 1/2] net: do not process device backlog during unregistration Julian Anastasov
@ 2015-07-09  6:59 ` Julian Anastasov
  2015-07-11  1:17 ` [PATCHv3 net 0/2] net: fixes for device unregistration David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Julian Anastasov @ 2015-07-09  6:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Eric W. Biederman, Stephen Hemminger, Vittorio G (VittGam)

Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:

CPU 1                  CPU 2
                       skb->dev: no reference
                       process_backlog:__skb_dequeue
                       process_backlog:local_irq_enable

on_each_cpu for
flush_backlog =>       IPI(hardirq): flush_backlog
                       - packet not found in backlog

                       CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections

netdev_run_todo,
rcu_barrier: no
ongoing callbacks
                       __netif_receive_skb_core:rcu_read_lock
                       - too late
free dev
                       process packet for freed dev

Fixes: 6e583ce5242f ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 net/core/dev.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6dd126a..a8e4dd4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3774,8 +3774,6 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
 
 	pt_prev = NULL;
 
-	rcu_read_lock();
-
 another_round:
 	skb->skb_iif = skb->dev->ifindex;
 
@@ -3785,7 +3783,7 @@ another_round:
 	    skb->protocol == cpu_to_be16(ETH_P_8021AD)) {
 		skb = skb_vlan_untag(skb);
 		if (unlikely(!skb))
-			goto unlock;
+			goto out;
 	}
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -3815,10 +3813,10 @@ skip_taps:
 	if (static_key_false(&ingress_needed)) {
 		skb = handle_ing(skb, &pt_prev, &ret, orig_dev);
 		if (!skb)
-			goto unlock;
+			goto out;
 
 		if (nf_ingress(skb, &pt_prev, &ret, orig_dev) < 0)
-			goto unlock;
+			goto out;
 	}
 #endif
 #ifdef CONFIG_NET_CLS_ACT
@@ -3836,7 +3834,7 @@ ncls:
 		if (vlan_do_receive(&skb))
 			goto another_round;
 		else if (unlikely(!skb))
-			goto unlock;
+			goto out;
 	}
 
 	rx_handler = rcu_dereference(skb->dev->rx_handler);
@@ -3848,7 +3846,7 @@ ncls:
 		switch (rx_handler(&skb)) {
 		case RX_HANDLER_CONSUMED:
 			ret = NET_RX_SUCCESS;
-			goto unlock;
+			goto out;
 		case RX_HANDLER_ANOTHER:
 			goto another_round;
 		case RX_HANDLER_EXACT:
@@ -3902,8 +3900,7 @@ drop:
 		ret = NET_RX_DROP;
 	}
 
-unlock:
-	rcu_read_unlock();
+out:
 	return ret;
 }
 
@@ -3934,29 +3931,30 @@ static int __netif_receive_skb(struct sk_buff *skb)
 
 static int netif_receive_skb_internal(struct sk_buff *skb)
 {
+	int ret;
+
 	net_timestamp_check(netdev_tstamp_prequeue, skb);
 
 	if (skb_defer_rx_timestamp(skb))
 		return NET_RX_SUCCESS;
 
+	rcu_read_lock();
+
 #ifdef CONFIG_RPS
 	if (static_key_false(&rps_needed)) {
 		struct rps_dev_flow voidflow, *rflow = &voidflow;
-		int cpu, ret;
-
-		rcu_read_lock();
-
-		cpu = get_rps_cpu(skb->dev, skb, &rflow);
+		int cpu = get_rps_cpu(skb->dev, skb, &rflow);
 
 		if (cpu >= 0) {
 			ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
 			rcu_read_unlock();
 			return ret;
 		}
-		rcu_read_unlock();
 	}
 #endif
-	return __netif_receive_skb(skb);
+	ret = __netif_receive_skb(skb);
+	rcu_read_unlock();
+	return ret;
 }
 
 /**
@@ -4501,8 +4499,10 @@ static int process_backlog(struct napi_struct *napi, int quota)
 		struct sk_buff *skb;
 
 		while ((skb = __skb_dequeue(&sd->process_queue))) {
+			rcu_read_lock();
 			local_irq_enable();
 			__netif_receive_skb(skb);
+			rcu_read_unlock();
 			local_irq_disable();
 			input_queue_head_incr(sd);
 			if (++work >= quota) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv3 net 0/2] net: fixes for device unregistration
  2015-07-09  6:59 [PATCHv3 net 0/2] net: fixes for device unregistration Julian Anastasov
  2015-07-09  6:59 ` [PATCHv3 net 1/2] net: do not process device backlog during unregistration Julian Anastasov
  2015-07-09  6:59 ` [PATCHv3 net 2/2] net: call rcu_read_lock early in process_backlog Julian Anastasov
@ 2015-07-11  1:17 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2015-07-11  1:17 UTC (permalink / raw)
  To: ja; +Cc: netdev, ebiederm, stephen, linuxbugs

From: Julian Anastasov <ja@ssi.bg>
Date: Thu,  9 Jul 2015 09:59:08 +0300

> Test script from Eric W. Biederman can catch a problem
> where packets from backlog are processed long after the last
> synchronize_net call. This can be reproduced after few tests
> if commit 381c759d9916 ("ipv4: Avoid crashing in ip_error")
> is reverted for the test. Incoming packets do not hold
> reference to device but even if they do, subsystems do not
> expect packets to fly during and after the NETDEV_UNREGISTER
> event.
> 
> The first fix has the cost of netif_running check in fast path.
> The second fix calls rcu_read_lock while local IRQ is disabled,
> I hope this is not against the rules.

Thanks for fixing this tricky bug.

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-07-11  6:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-09  6:59 [PATCHv3 net 0/2] net: fixes for device unregistration Julian Anastasov
2015-07-09  6:59 ` [PATCHv3 net 1/2] net: do not process device backlog during unregistration Julian Anastasov
2015-07-09  6:59 ` [PATCHv3 net 2/2] net: call rcu_read_lock early in process_backlog Julian Anastasov
2015-07-11  1:17 ` [PATCHv3 net 0/2] net: fixes for device unregistration David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).