linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xen-netfront: fix potential deadlock in xennet_remove()
@ 2020-07-22  6:52 Andrea Righi
  2020-07-23 21:57 ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Andrea Righi @ 2020-07-22  6:52 UTC (permalink / raw)
  To: Boris Ostrovsky, Juergen Gross, Stefano Stabellini
  Cc: David S. Miller, Jakub Kicinski, xen-devel, netdev, linux-kernel

There's a potential race in xennet_remove(); this is what the driver is
doing upon unregistering a network device:

  1. state = read bus state
  2. if state is not "Closed":
  3.    request to set state to "Closing"
  4.    wait for state to be set to "Closing"
  5.    request to set state to "Closed"
  6.    wait for state to be set to "Closed"

If the state changes to "Closed" immediately after step 1 we are stuck
forever in step 4, because the state will never go back from "Closed" to
"Closing".

Make sure to check also for state == "Closed" in step 4 to prevent the
deadlock.

Also add a 5 sec timeout any time we wait for the bus state to change,
to avoid getting stuck forever in wait_event() and add a debug message
to help tracking down potential similar issues.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
---
 drivers/net/xen-netfront.c | 79 +++++++++++++++++++++++++++-----------
 1 file changed, 57 insertions(+), 22 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 482c6c8b0fb7..e09caba93dd9 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -63,6 +63,8 @@ module_param_named(max_queues, xennet_max_queues, uint, 0644);
 MODULE_PARM_DESC(max_queues,
 		 "Maximum number of queues per virtual interface");
 
+#define XENNET_TIMEOUT  (5 * HZ)
+
 static const struct ethtool_ops xennet_ethtool_ops;
 
 struct netfront_cb {
@@ -1334,12 +1336,20 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 
 	netif_carrier_off(netdev);
 
-	xenbus_switch_state(dev, XenbusStateInitialising);
-	wait_event(module_wq,
-		   xenbus_read_driver_state(dev->otherend) !=
-		   XenbusStateClosed &&
-		   xenbus_read_driver_state(dev->otherend) !=
-		   XenbusStateUnknown);
+	do {
+		dev_dbg(&dev->dev,
+			"%s: switching to XenbusStateInitialising\n",
+			dev->nodename);
+		xenbus_switch_state(dev, XenbusStateInitialising);
+		err = wait_event_timeout(module_wq,
+				 xenbus_read_driver_state(dev->otherend) !=
+				 XenbusStateClosed &&
+				 xenbus_read_driver_state(dev->otherend) !=
+				 XenbusStateUnknown, XENNET_TIMEOUT);
+		dev_dbg(&dev->dev, "%s: state = %d\n", dev->nodename,
+			xenbus_read_driver_state(dev->otherend));
+	} while (!err);
+
 	return netdev;
 
  exit:
@@ -2139,28 +2149,53 @@ static const struct attribute_group xennet_dev_group = {
 };
 #endif /* CONFIG_SYSFS */
 
-static int xennet_remove(struct xenbus_device *dev)
+static void xennet_bus_close(struct xenbus_device *dev)
 {
-	struct netfront_info *info = dev_get_drvdata(&dev->dev);
-
-	dev_dbg(&dev->dev, "%s\n", dev->nodename);
+	int ret;
 
-	if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) {
+	if (xenbus_read_driver_state(dev->otherend) == XenbusStateClosed)
+		return;
+	do {
+		dev_dbg(&dev->dev, "%s: switching to XenbusStateClosing\n",
+			dev->nodename);
 		xenbus_switch_state(dev, XenbusStateClosing);
-		wait_event(module_wq,
-			   xenbus_read_driver_state(dev->otherend) ==
-			   XenbusStateClosing ||
-			   xenbus_read_driver_state(dev->otherend) ==
-			   XenbusStateUnknown);
+		ret = wait_event_timeout(module_wq,
+				   xenbus_read_driver_state(dev->otherend) ==
+				   XenbusStateClosing ||
+				   xenbus_read_driver_state(dev->otherend) ==
+				   XenbusStateClosed ||
+				   xenbus_read_driver_state(dev->otherend) ==
+				   XenbusStateUnknown,
+				   XENNET_TIMEOUT);
+		dev_dbg(&dev->dev, "%s: state = %d\n", dev->nodename,
+			xenbus_read_driver_state(dev->otherend));
+	} while (!ret);
+
+	if (xenbus_read_driver_state(dev->otherend) == XenbusStateClosed)
+		return;
 
+	do {
+		dev_dbg(&dev->dev, "%s: switching to XenbusStateClosed\n",
+			dev->nodename);
 		xenbus_switch_state(dev, XenbusStateClosed);
-		wait_event(module_wq,
-			   xenbus_read_driver_state(dev->otherend) ==
-			   XenbusStateClosed ||
-			   xenbus_read_driver_state(dev->otherend) ==
-			   XenbusStateUnknown);
-	}
+		ret = wait_event_timeout(module_wq,
+				   xenbus_read_driver_state(dev->otherend) ==
+				   XenbusStateClosed ||
+				   xenbus_read_driver_state(dev->otherend) ==
+				   XenbusStateUnknown,
+				   XENNET_TIMEOUT);
+		dev_dbg(&dev->dev, "%s: state = %d\n", dev->nodename,
+			xenbus_read_driver_state(dev->otherend));
+	} while (!ret);
+}
+
+static int xennet_remove(struct xenbus_device *dev)
+{
+	struct netfront_info *info = dev_get_drvdata(&dev->dev);
+
+	dev_dbg(&dev->dev, "%s\n", dev->nodename);
 
+	xennet_bus_close(dev);
 	xennet_disconnect_backend(info);
 
 	if (info->netdev->reg_state == NETREG_REGISTERED)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xen-netfront: fix potential deadlock in xennet_remove()
  2020-07-22  6:52 [PATCH] xen-netfront: fix potential deadlock in xennet_remove() Andrea Righi
@ 2020-07-23 21:57 ` David Miller
  2020-07-24  6:30   ` Andrea Righi
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2020-07-23 21:57 UTC (permalink / raw)
  To: andrea.righi
  Cc: boris.ostrovsky, jgross, sstabellini, kuba, xen-devel, netdev,
	linux-kernel

From: Andrea Righi <andrea.righi@canonical.com>
Date: Wed, 22 Jul 2020 08:52:11 +0200

> +static int xennet_remove(struct xenbus_device *dev)
> +{
> +	struct netfront_info *info = dev_get_drvdata(&dev->dev);
> +
> +	dev_dbg(&dev->dev, "%s\n", dev->nodename);

These kinds of debugging messages provide zero context and are so much
less useful than simply using tracepoints which are more universally
available than printk debugging facilities.

Please remove all of the dev_dbg() calls from this patch.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xen-netfront: fix potential deadlock in xennet_remove()
  2020-07-23 21:57 ` David Miller
@ 2020-07-24  6:30   ` Andrea Righi
  0 siblings, 0 replies; 3+ messages in thread
From: Andrea Righi @ 2020-07-24  6:30 UTC (permalink / raw)
  To: David Miller
  Cc: boris.ostrovsky, jgross, sstabellini, kuba, xen-devel, netdev,
	linux-kernel

On Thu, Jul 23, 2020 at 02:57:22PM -0700, David Miller wrote:
> From: Andrea Righi <andrea.righi@canonical.com>
> Date: Wed, 22 Jul 2020 08:52:11 +0200
> 
> > +static int xennet_remove(struct xenbus_device *dev)
> > +{
> > +	struct netfront_info *info = dev_get_drvdata(&dev->dev);
> > +
> > +	dev_dbg(&dev->dev, "%s\n", dev->nodename);
> 
> These kinds of debugging messages provide zero context and are so much
> less useful than simply using tracepoints which are more universally
> available than printk debugging facilities.
> 
> Please remove all of the dev_dbg() calls from this patch.

I didn't add that dev_dbg() call, it's just the old code moved around,
but I agree, I'll remove that call and send a new version of this patch.

Thanks for looking at it!
-Andrea

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-24  6:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-22  6:52 [PATCH] xen-netfront: fix potential deadlock in xennet_remove() Andrea Righi
2020-07-23 21:57 ` David Miller
2020-07-24  6:30   ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).