netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call
@ 2017-08-07 22:24 Jacob Keller
  2017-08-09  1:16 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Jacob Keller @ 2017-08-07 22:24 UTC (permalink / raw)
  To: netdev; +Cc: Jacob Keller

Fix an issue with relying on netif_running() which could be true during
when dev->open() handler is being called, even if it would exit with
a failure. This ensures the state does not get set and removed with
a narrow race for other callers to read it as open when infact it never
finished opening.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
I found this as a result of debugging a race condition in the i40evf
driver, in which we assumed that netif_running() would not be true until
after dev->open() had been called and succeeded. Unfortunately we can't
hold the rtnl_lock() while checking netif_running() because it would
cause a deadlock between our reset task and our ndo_open handler.

I am wondering whether the proposed change is acceptable here, or
whether some ndo_open handlers rely on __LINK_STATE_START being true
prior to their being called?

 net/core/dev.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1d75499add72..11953af90427 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1362,8 +1362,6 @@ static int __dev_open(struct net_device *dev)
 	if (ret)
 		return ret;
 
-	set_bit(__LINK_STATE_START, &dev->state);
-
 	if (ops->ndo_validate_addr)
 		ret = ops->ndo_validate_addr(dev);
 
@@ -1372,9 +1370,8 @@ static int __dev_open(struct net_device *dev)
 
 	netpoll_poll_enable(dev);
 
-	if (ret)
-		clear_bit(__LINK_STATE_START, &dev->state);
-	else {
+	if (!ret)
+		set_bit(__LINK_STATE_START, &dev->state);
 		dev->flags |= IFF_UP;
 		dev_set_rx_mode(dev);
 		dev_activate(dev);
-- 
2.14.0.rc1.251.g593d8d6362ce

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call
  2017-08-07 22:24 [RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call Jacob Keller
@ 2017-08-09  1:16 ` David Miller
  2017-08-09 20:00   ` Keller, Jacob E
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2017-08-09  1:16 UTC (permalink / raw)
  To: jacob.e.keller; +Cc: netdev

From: Jacob Keller <jacob.e.keller@intel.com>
Date: Mon,  7 Aug 2017 15:24:21 -0700

> Fix an issue with relying on netif_running() which could be true during
> when dev->open() handler is being called, even if it would exit with
> a failure. This ensures the state does not get set and removed with
> a narrow race for other callers to read it as open when infact it never
> finished opening.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
> I found this as a result of debugging a race condition in the i40evf
> driver, in which we assumed that netif_running() would not be true until
> after dev->open() had been called and succeeded. Unfortunately we can't
> hold the rtnl_lock() while checking netif_running() because it would
> cause a deadlock between our reset task and our ndo_open handler.
> 
> I am wondering whether the proposed change is acceptable here, or
> whether some ndo_open handlers rely on __LINK_STATE_START being true
> prior to their being called?

I think this has the potential to break a bunch of drivers, but I
cannot prove this.

A lot of drivers have several pieces of state setup when they bring
the device up.  And these routines are also invoked from other code
paths like suspend/resume, PCI-E error recovery, etc. and they
probably do netif_running() calls here and there.

This behavior has been this way for a very long time, so the risk is
quite high I think.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call
  2017-08-09  1:16 ` David Miller
@ 2017-08-09 20:00   ` Keller, Jacob E
  2017-08-09 21:24     ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Keller, Jacob E @ 2017-08-09 20:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> Behalf Of David Miller
> Sent: Tuesday, August 08, 2017 6:17 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] net: don't set __LINK_STATE_START until after dev->open()
> call
> 
> From: Jacob Keller <jacob.e.keller@intel.com>
> Date: Mon,  7 Aug 2017 15:24:21 -0700
> > I am wondering whether the proposed change is acceptable here, or
> > whether some ndo_open handlers rely on __LINK_STATE_START being true
> > prior to their being called?
> 
> I think this has the potential to break a bunch of drivers, but I
> cannot prove this.
> 
> A lot of drivers have several pieces of state setup when they bring
> the device up.  And these routines are also invoked from other code
> paths like suspend/resume, PCI-E error recovery, etc. and they
> probably do netif_running() calls here and there.
> 
> This behavior has been this way for a very long time, so the risk is
> quite high I think.

That's what I am worried about. However, I think there are problems with leaving it. A lot of drivers rely on netif_running() to determine whether or not the device is open, but they may be using it relying on all the changes caused by the .ndo_open() handler are finished. The current system there is a race, since you set the __LINK_STATE_START before .ndo_open is called.

This is what I found when attempting to debug a race in i40evf_open and i40evf_reset. The reset ended up running at the same time as open and the two flows collided because reset relied on netif_running() under the assumption that it would not return true until the device was actually open. However, it was running at the same time as open was, so it was trying to shutdown things at the same time as the open call was trying to bring them up.

Any location which uses rtnl_lock() will be safe, since the dev_open call is under rtnl_lock() so all callers of netif_running() under lock are serialized to see only the flag before or after dev_open completes. This is the majority of the callers I think.

Unfortunately, I can't really hold the rtnl_lock() during reset, since this caused a deadlock when I tried it due to lock ordering problems. I'm not sure if I could fix that or not.

I think there are other places which are incorrect, but haven't yet run into an issue. The only place I can see where the functionality might be changed for the worse is if a .dev_open handler actually relies on checking netif_running() during its call, which seems incredibly silly to me....

Thanks,
Jake

P.S. I apologize for the typo in this patch, if there is more discussion I can send a v2 which fixes it.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call
  2017-08-09 20:00   ` Keller, Jacob E
@ 2017-08-09 21:24     ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2017-08-09 21:24 UTC (permalink / raw)
  To: jacob.e.keller; +Cc: netdev

From: "Keller, Jacob E" <jacob.e.keller@intel.com>
Date: Wed, 9 Aug 2017 20:00:55 +0000

> That's what I am worried about. However, I think there are problems
> with leaving it. A lot of drivers rely on netif_running() to
> determine whether or not the device is open, but they may be using
> it relying on all the changes caused by the .ndo_open() handler are
> finished. The current system there is a race, since you set the
> __LINK_STATE_START before .ndo_open is called.

I think this is only a half-accurate description.

What drivers want to know is if initialization phase X of ndo_open()
has completed.

And honestly it must be like this because this is what one would
use to guide the teardown during failure paths of ndo_open(), right?

So I would really rather drivers internally track this "I initialized
X" state, as most drivers do, rather than take the risk of changing
the netif_running() behavior which can impact ~500 drivers :-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-08-09 21:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-07 22:24 [RFC PATCH] net: don't set __LINK_STATE_START until after dev->open() call Jacob Keller
2017-08-09  1:16 ` David Miller
2017-08-09 20:00   ` Keller, Jacob E
2017-08-09 21:24     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).