All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix a couple of crashes in netfront
@ 2018-01-11  9:36 Ross Lagerwall
  2018-01-11  9:36 ` [PATCH 1/2] xen/grant-table: Use put_page instead of free_page Ross Lagerwall
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11  9:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Ross Lagerwall, Boris Ostrovsky, Juergen Gross, linux-kernel

Here are a couple of patches to fix two crashes in netfront.

Ross Lagerwall (2):
  xen/grant-table: Use put_page instead of free_page
  xen-netfront: Fix race between device setup and open

 drivers/net/xen-netfront.c | 46 ++++++++++++++++++++++++----------------------
 drivers/xen/grant-table.c  |  4 ++--
 2 files changed, 26 insertions(+), 24 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] xen/grant-table: Use put_page instead of free_page
  2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
  2018-01-11  9:36 ` [PATCH 1/2] xen/grant-table: Use put_page instead of free_page Ross Lagerwall
@ 2018-01-11  9:36 ` Ross Lagerwall
  2018-01-11 15:48   ` Ross Lagerwall
                     ` (3 more replies)
  2018-01-11  9:36 ` [PATCH 2/2] xen-netfront: Fix race between device setup and open Ross Lagerwall
                   ` (5 subsequent siblings)
  7 siblings, 4 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11  9:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Ross Lagerwall, Boris Ostrovsky, Juergen Gross, linux-kernel

The page given to gnttab_end_foreign_access() to free could be a
compound page so use put_page() instead of free_page() since it can
handle both compound and single pages correctly.

This bug was discovered when migrating a Xen VM with several VIFs and
CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
iterations. All netfront devices disconnect from the backend during a
suspend/resume and this will call gnttab_end_foreign_access() if a
netfront queue has an outstanding skb. The mismatch between calling
get_page() and free_page() on a compound page causes a reference
counting error which is detected when DEBUG_VM is enabled.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 drivers/xen/grant-table.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index f45114f..27be107 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -382,7 +382,7 @@ static void gnttab_handle_deferred(struct timer_list *unused)
 			if (entry->page) {
 				pr_debug("freeing g.e. %#x (pfn %#lx)\n",
 					 entry->ref, page_to_pfn(entry->page));
-				__free_page(entry->page);
+				put_page(entry->page);
 			} else
 				pr_info("freeing g.e. %#x\n", entry->ref);
 			kfree(entry);
@@ -438,7 +438,7 @@ void gnttab_end_foreign_access(grant_ref_t ref, int readonly,
 	if (gnttab_end_foreign_access_ref(ref, readonly)) {
 		put_free_entry(ref);
 		if (page != 0)
-			free_page(page);
+			put_page(virt_to_page(page));
 	} else
 		gnttab_add_deferred(ref, readonly,
 				    page ? virt_to_page(page) : NULL);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/2] xen/grant-table: Use put_page instead of free_page
  2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
@ 2018-01-11  9:36 ` Ross Lagerwall
  2018-01-11  9:36 ` Ross Lagerwall
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11  9:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Ross Lagerwall, Boris Ostrovsky, linux-kernel

The page given to gnttab_end_foreign_access() to free could be a
compound page so use put_page() instead of free_page() since it can
handle both compound and single pages correctly.

This bug was discovered when migrating a Xen VM with several VIFs and
CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
iterations. All netfront devices disconnect from the backend during a
suspend/resume and this will call gnttab_end_foreign_access() if a
netfront queue has an outstanding skb. The mismatch between calling
get_page() and free_page() on a compound page causes a reference
counting error which is detected when DEBUG_VM is enabled.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 drivers/xen/grant-table.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index f45114f..27be107 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -382,7 +382,7 @@ static void gnttab_handle_deferred(struct timer_list *unused)
 			if (entry->page) {
 				pr_debug("freeing g.e. %#x (pfn %#lx)\n",
 					 entry->ref, page_to_pfn(entry->page));
-				__free_page(entry->page);
+				put_page(entry->page);
 			} else
 				pr_info("freeing g.e. %#x\n", entry->ref);
 			kfree(entry);
@@ -438,7 +438,7 @@ void gnttab_end_foreign_access(grant_ref_t ref, int readonly,
 	if (gnttab_end_foreign_access_ref(ref, readonly)) {
 		put_free_entry(ref);
 		if (page != 0)
-			free_page(page);
+			put_page(virt_to_page(page));
 	} else
 		gnttab_add_deferred(ref, readonly,
 				    page ? virt_to_page(page) : NULL);
-- 
2.9.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
  2018-01-11  9:36 ` [PATCH 1/2] xen/grant-table: Use put_page instead of free_page Ross Lagerwall
  2018-01-11  9:36 ` Ross Lagerwall
@ 2018-01-11  9:36 ` Ross Lagerwall
  2018-01-11 15:26   ` David Miller
                     ` (3 more replies)
  2018-01-11  9:36 ` Ross Lagerwall
                   ` (4 subsequent siblings)
  7 siblings, 4 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11  9:36 UTC (permalink / raw)
  To: xen-devel
  Cc: Ross Lagerwall, Boris Ostrovsky, Juergen Gross, netdev, linux-kernel

When a netfront device is set up it registers a netdev fairly early on,
before it has set up the queues and is actually usable. A userspace tool
like NetworkManager will immediately try to open it and access its state
as soon as it appears. The bug can be reproduced by hotplugging VIFs
until the VM runs out of grant refs. It registers the netdev but fails
to set up any queues (since there are no more grant refs). In the
meantime, NetworkManager opens the device and the kernel crashes trying
to access the queues (of which there are none).

Fix this in two ways:
* For initial setup, register the netdev much later, after the queues
are setup. This avoids the race entirely.
* During a suspend/resume cycle, the frontend reconnects to the backend
and the queues are recreated. It is possible (though highly unlikely) to
race with something opening the device and accessing the queues after
they have been destroyed but before they have been recreated. Extend the
region covered by the rtnl semaphore to protect against this race. There
is a possibility that we fail to recreate the queues so check for this
in the open function.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 drivers/net/xen-netfront.c | 46 ++++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 9bd7dde..8328d39 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -351,6 +351,9 @@ static int xennet_open(struct net_device *dev)
 	unsigned int i = 0;
 	struct netfront_queue *queue = NULL;
 
+	if (!np->queues)
+		return -ENODEV;
+
 	for (i = 0; i < num_queues; ++i) {
 		queue = &np->queues[i];
 		napi_enable(&queue->napi);
@@ -1358,18 +1361,8 @@ static int netfront_probe(struct xenbus_device *dev,
 #ifdef CONFIG_SYSFS
 	info->netdev->sysfs_groups[0] = &xennet_dev_group;
 #endif
-	err = register_netdev(info->netdev);
-	if (err) {
-		pr_warn("%s: register_netdev err=%d\n", __func__, err);
-		goto fail;
-	}
 
 	return 0;
-
- fail:
-	xennet_free_netdev(netdev);
-	dev_set_drvdata(&dev->dev, NULL);
-	return err;
 }
 
 static void xennet_end_access(int ref, void *page)
@@ -1737,8 +1730,6 @@ static void xennet_destroy_queues(struct netfront_info *info)
 {
 	unsigned int i;
 
-	rtnl_lock();
-
 	for (i = 0; i < info->netdev->real_num_tx_queues; i++) {
 		struct netfront_queue *queue = &info->queues[i];
 
@@ -1747,8 +1738,6 @@ static void xennet_destroy_queues(struct netfront_info *info)
 		netif_napi_del(&queue->napi);
 	}
 
-	rtnl_unlock();
-
 	kfree(info->queues);
 	info->queues = NULL;
 }
@@ -1764,8 +1753,6 @@ static int xennet_create_queues(struct netfront_info *info,
 	if (!info->queues)
 		return -ENOMEM;
 
-	rtnl_lock();
-
 	for (i = 0; i < *num_queues; i++) {
 		struct netfront_queue *queue = &info->queues[i];
 
@@ -1774,7 +1761,7 @@ static int xennet_create_queues(struct netfront_info *info,
 
 		ret = xennet_init_queue(queue);
 		if (ret < 0) {
-			dev_warn(&info->netdev->dev,
+			dev_warn(&info->xbdev->dev,
 				 "only created %d queues\n", i);
 			*num_queues = i;
 			break;
@@ -1788,10 +1775,8 @@ static int xennet_create_queues(struct netfront_info *info,
 
 	netif_set_real_num_tx_queues(info->netdev, *num_queues);
 
-	rtnl_unlock();
-
 	if (*num_queues == 0) {
-		dev_err(&info->netdev->dev, "no queues\n");
+		dev_err(&info->xbdev->dev, "no queues\n");
 		return -EINVAL;
 	}
 	return 0;
@@ -1828,6 +1813,7 @@ static int talk_to_netback(struct xenbus_device *dev,
 		goto out;
 	}
 
+	rtnl_lock();
 	if (info->queues)
 		xennet_destroy_queues(info);
 
@@ -1838,6 +1824,7 @@ static int talk_to_netback(struct xenbus_device *dev,
 		info->queues = NULL;
 		goto out;
 	}
+	rtnl_unlock();
 
 	/* Create shared ring, alloc event channel -- for each queue */
 	for (i = 0; i < num_queues; ++i) {
@@ -1934,8 +1921,10 @@ static int talk_to_netback(struct xenbus_device *dev,
 	xenbus_transaction_end(xbt, 1);
  destroy_ring:
 	xennet_disconnect_backend(info);
+	rtnl_lock();
 	xennet_destroy_queues(info);
  out:
+	rtnl_unlock();
 	device_unregister(&dev->dev);
 	return err;
 }
@@ -1965,6 +1954,15 @@ static int xennet_connect(struct net_device *dev)
 	netdev_update_features(dev);
 	rtnl_unlock();
 
+	if (dev->reg_state == NETREG_UNINITIALIZED) {
+		err = register_netdev(dev);
+		if (err) {
+			pr_warn("%s: register_netdev err=%d\n", __func__, err);
+			device_unregister(&np->xbdev->dev);
+			return err;
+		}
+	}
+
 	/*
 	 * All public and private state should now be sane.  Get
 	 * ready to start sending and receiving packets and give the driver
@@ -2150,10 +2148,14 @@ static int xennet_remove(struct xenbus_device *dev)
 
 	xennet_disconnect_backend(info);
 
-	unregister_netdev(info->netdev);
+	if (info->netdev->reg_state == NETREG_REGISTERED)
+		unregister_netdev(info->netdev);
 
-	if (info->queues)
+	if (info->queues) {
+		rtnl_lock();
 		xennet_destroy_queues(info);
+		rtnl_unlock();
+	}
 	xennet_free_netdev(info->netdev);
 
 	return 0;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
                   ` (2 preceding siblings ...)
  2018-01-11  9:36 ` [PATCH 2/2] xen-netfront: Fix race between device setup and open Ross Lagerwall
@ 2018-01-11  9:36 ` Ross Lagerwall
  2018-01-11 15:48 ` [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11  9:36 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Ross Lagerwall, Boris Ostrovsky, linux-kernel, netdev

When a netfront device is set up it registers a netdev fairly early on,
before it has set up the queues and is actually usable. A userspace tool
like NetworkManager will immediately try to open it and access its state
as soon as it appears. The bug can be reproduced by hotplugging VIFs
until the VM runs out of grant refs. It registers the netdev but fails
to set up any queues (since there are no more grant refs). In the
meantime, NetworkManager opens the device and the kernel crashes trying
to access the queues (of which there are none).

Fix this in two ways:
* For initial setup, register the netdev much later, after the queues
are setup. This avoids the race entirely.
* During a suspend/resume cycle, the frontend reconnects to the backend
and the queues are recreated. It is possible (though highly unlikely) to
race with something opening the device and accessing the queues after
they have been destroyed but before they have been recreated. Extend the
region covered by the rtnl semaphore to protect against this race. There
is a possibility that we fail to recreate the queues so check for this
in the open function.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 drivers/net/xen-netfront.c | 46 ++++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 9bd7dde..8328d39 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -351,6 +351,9 @@ static int xennet_open(struct net_device *dev)
 	unsigned int i = 0;
 	struct netfront_queue *queue = NULL;
 
+	if (!np->queues)
+		return -ENODEV;
+
 	for (i = 0; i < num_queues; ++i) {
 		queue = &np->queues[i];
 		napi_enable(&queue->napi);
@@ -1358,18 +1361,8 @@ static int netfront_probe(struct xenbus_device *dev,
 #ifdef CONFIG_SYSFS
 	info->netdev->sysfs_groups[0] = &xennet_dev_group;
 #endif
-	err = register_netdev(info->netdev);
-	if (err) {
-		pr_warn("%s: register_netdev err=%d\n", __func__, err);
-		goto fail;
-	}
 
 	return 0;
-
- fail:
-	xennet_free_netdev(netdev);
-	dev_set_drvdata(&dev->dev, NULL);
-	return err;
 }
 
 static void xennet_end_access(int ref, void *page)
@@ -1737,8 +1730,6 @@ static void xennet_destroy_queues(struct netfront_info *info)
 {
 	unsigned int i;
 
-	rtnl_lock();
-
 	for (i = 0; i < info->netdev->real_num_tx_queues; i++) {
 		struct netfront_queue *queue = &info->queues[i];
 
@@ -1747,8 +1738,6 @@ static void xennet_destroy_queues(struct netfront_info *info)
 		netif_napi_del(&queue->napi);
 	}
 
-	rtnl_unlock();
-
 	kfree(info->queues);
 	info->queues = NULL;
 }
@@ -1764,8 +1753,6 @@ static int xennet_create_queues(struct netfront_info *info,
 	if (!info->queues)
 		return -ENOMEM;
 
-	rtnl_lock();
-
 	for (i = 0; i < *num_queues; i++) {
 		struct netfront_queue *queue = &info->queues[i];
 
@@ -1774,7 +1761,7 @@ static int xennet_create_queues(struct netfront_info *info,
 
 		ret = xennet_init_queue(queue);
 		if (ret < 0) {
-			dev_warn(&info->netdev->dev,
+			dev_warn(&info->xbdev->dev,
 				 "only created %d queues\n", i);
 			*num_queues = i;
 			break;
@@ -1788,10 +1775,8 @@ static int xennet_create_queues(struct netfront_info *info,
 
 	netif_set_real_num_tx_queues(info->netdev, *num_queues);
 
-	rtnl_unlock();
-
 	if (*num_queues == 0) {
-		dev_err(&info->netdev->dev, "no queues\n");
+		dev_err(&info->xbdev->dev, "no queues\n");
 		return -EINVAL;
 	}
 	return 0;
@@ -1828,6 +1813,7 @@ static int talk_to_netback(struct xenbus_device *dev,
 		goto out;
 	}
 
+	rtnl_lock();
 	if (info->queues)
 		xennet_destroy_queues(info);
 
@@ -1838,6 +1824,7 @@ static int talk_to_netback(struct xenbus_device *dev,
 		info->queues = NULL;
 		goto out;
 	}
+	rtnl_unlock();
 
 	/* Create shared ring, alloc event channel -- for each queue */
 	for (i = 0; i < num_queues; ++i) {
@@ -1934,8 +1921,10 @@ static int talk_to_netback(struct xenbus_device *dev,
 	xenbus_transaction_end(xbt, 1);
  destroy_ring:
 	xennet_disconnect_backend(info);
+	rtnl_lock();
 	xennet_destroy_queues(info);
  out:
+	rtnl_unlock();
 	device_unregister(&dev->dev);
 	return err;
 }
@@ -1965,6 +1954,15 @@ static int xennet_connect(struct net_device *dev)
 	netdev_update_features(dev);
 	rtnl_unlock();
 
+	if (dev->reg_state == NETREG_UNINITIALIZED) {
+		err = register_netdev(dev);
+		if (err) {
+			pr_warn("%s: register_netdev err=%d\n", __func__, err);
+			device_unregister(&np->xbdev->dev);
+			return err;
+		}
+	}
+
 	/*
 	 * All public and private state should now be sane.  Get
 	 * ready to start sending and receiving packets and give the driver
@@ -2150,10 +2148,14 @@ static int xennet_remove(struct xenbus_device *dev)
 
 	xennet_disconnect_backend(info);
 
-	unregister_netdev(info->netdev);
+	if (info->netdev->reg_state == NETREG_REGISTERED)
+		unregister_netdev(info->netdev);
 
-	if (info->queues)
+	if (info->queues) {
+		rtnl_lock();
 		xennet_destroy_queues(info);
+		rtnl_unlock();
+	}
 	xennet_free_netdev(info->netdev);
 
 	return 0;
-- 
2.9.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11  9:36 ` [PATCH 2/2] xen-netfront: Fix race between device setup and open Ross Lagerwall
@ 2018-01-11 15:26   ` David Miller
  2018-01-11 15:49     ` Ross Lagerwall
  2018-01-11 15:49     ` Ross Lagerwall
  2018-01-11 15:26   ` David Miller
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 26+ messages in thread
From: David Miller @ 2018-01-11 15:26 UTC (permalink / raw)
  To: ross.lagerwall; +Cc: xen-devel, boris.ostrovsky, jgross, netdev, linux-kernel

From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Thu, 11 Jan 2018 09:36:38 +0000

> When a netfront device is set up it registers a netdev fairly early on,
> before it has set up the queues and is actually usable. A userspace tool
> like NetworkManager will immediately try to open it and access its state
> as soon as it appears. The bug can be reproduced by hotplugging VIFs
> until the VM runs out of grant refs. It registers the netdev but fails
> to set up any queues (since there are no more grant refs). In the
> meantime, NetworkManager opens the device and the kernel crashes trying
> to access the queues (of which there are none).
> 
> Fix this in two ways:
> * For initial setup, register the netdev much later, after the queues
> are setup. This avoids the race entirely.
> * During a suspend/resume cycle, the frontend reconnects to the backend
> and the queues are recreated. It is possible (though highly unlikely) to
> race with something opening the device and accessing the queues after
> they have been destroyed but before they have been recreated. Extend the
> region covered by the rtnl semaphore to protect against this race. There
> is a possibility that we fail to recreate the queues so check for this
> in the open function.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Where is patch 1/2 and the 0/2 header posting which explains what this
patch series is doing, how it is doing it, and why it is doing it that
way?

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11  9:36 ` [PATCH 2/2] xen-netfront: Fix race between device setup and open Ross Lagerwall
  2018-01-11 15:26   ` David Miller
@ 2018-01-11 15:26   ` David Miller
  2018-01-16 18:56   ` Boris Ostrovsky
  2018-01-16 18:56   ` Boris Ostrovsky
  3 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2018-01-11 15:26 UTC (permalink / raw)
  To: ross.lagerwall; +Cc: jgross, xen-devel, boris.ostrovsky, linux-kernel, netdev

From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Thu, 11 Jan 2018 09:36:38 +0000

> When a netfront device is set up it registers a netdev fairly early on,
> before it has set up the queues and is actually usable. A userspace tool
> like NetworkManager will immediately try to open it and access its state
> as soon as it appears. The bug can be reproduced by hotplugging VIFs
> until the VM runs out of grant refs. It registers the netdev but fails
> to set up any queues (since there are no more grant refs). In the
> meantime, NetworkManager opens the device and the kernel crashes trying
> to access the queues (of which there are none).
> 
> Fix this in two ways:
> * For initial setup, register the netdev much later, after the queues
> are setup. This avoids the race entirely.
> * During a suspend/resume cycle, the frontend reconnects to the backend
> and the queues are recreated. It is possible (though highly unlikely) to
> race with something opening the device and accessing the queues after
> they have been destroyed but before they have been recreated. Extend the
> region covered by the rtnl semaphore to protect against this race. There
> is a possibility that we fail to recreate the queues so check for this
> in the open function.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Where is patch 1/2 and the 0/2 header posting which explains what this
patch series is doing, how it is doing it, and why it is doing it that
way?

Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/2] Fix a couple of crashes in netfront
  2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
                   ` (4 preceding siblings ...)
  2018-01-11 15:48 ` [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
@ 2018-01-11 15:48 ` Ross Lagerwall
  2018-02-06 14:52 ` Juergen Gross
  2018-02-06 14:52 ` Juergen Gross
  7 siblings, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11 15:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Juergen Gross, linux-kernel, netdev

+CC netdev

On 01/11/2018 09:36 AM, Ross Lagerwall wrote:
> Here are a couple of patches to fix two crashes in netfront.
> 
> Ross Lagerwall (2):
>    xen/grant-table: Use put_page instead of free_page
>    xen-netfront: Fix race between device setup and open
> 
>   drivers/net/xen-netfront.c | 46 ++++++++++++++++++++++++----------------------
>   drivers/xen/grant-table.c  |  4 ++--
>   2 files changed, 26 insertions(+), 24 deletions(-)
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/2] Fix a couple of crashes in netfront
  2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
                   ` (3 preceding siblings ...)
  2018-01-11  9:36 ` Ross Lagerwall
@ 2018-01-11 15:48 ` Ross Lagerwall
  2018-01-11 15:48 ` Ross Lagerwall
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11 15:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, netdev, Boris Ostrovsky, linux-kernel

+CC netdev

On 01/11/2018 09:36 AM, Ross Lagerwall wrote:
> Here are a couple of patches to fix two crashes in netfront.
> 
> Ross Lagerwall (2):
>    xen/grant-table: Use put_page instead of free_page
>    xen-netfront: Fix race between device setup and open
> 
>   drivers/net/xen-netfront.c | 46 ++++++++++++++++++++++++----------------------
>   drivers/xen/grant-table.c  |  4 ++--
>   2 files changed, 26 insertions(+), 24 deletions(-)
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/grant-table: Use put_page instead of free_page
  2018-01-11  9:36 ` Ross Lagerwall
@ 2018-01-11 15:48   ` Ross Lagerwall
  2018-01-11 15:48   ` Ross Lagerwall
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11 15:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Juergen Gross, linux-kernel, netdev

+CC netdev

On 01/11/2018 09:36 AM, Ross Lagerwall wrote:
> The page given to gnttab_end_foreign_access() to free could be a
> compound page so use put_page() instead of free_page() since it can
> handle both compound and single pages correctly.
> 
> This bug was discovered when migrating a Xen VM with several VIFs and
> CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
> iterations. All netfront devices disconnect from the backend during a
> suspend/resume and this will call gnttab_end_foreign_access() if a
> netfront queue has an outstanding skb. The mismatch between calling
> get_page() and free_page() on a compound page causes a reference
> counting error which is detected when DEBUG_VM is enabled.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>   drivers/xen/grant-table.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index f45114f..27be107 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -382,7 +382,7 @@ static void gnttab_handle_deferred(struct timer_list *unused)
>   			if (entry->page) {
>   				pr_debug("freeing g.e. %#x (pfn %#lx)\n",
>   					 entry->ref, page_to_pfn(entry->page));
> -				__free_page(entry->page);
> +				put_page(entry->page);
>   			} else
>   				pr_info("freeing g.e. %#x\n", entry->ref);
>   			kfree(entry);
> @@ -438,7 +438,7 @@ void gnttab_end_foreign_access(grant_ref_t ref, int readonly,
>   	if (gnttab_end_foreign_access_ref(ref, readonly)) {
>   		put_free_entry(ref);
>   		if (page != 0)
> -			free_page(page);
> +			put_page(virt_to_page(page));
>   	} else
>   		gnttab_add_deferred(ref, readonly,
>   				    page ? virt_to_page(page) : NULL);
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/grant-table: Use put_page instead of free_page
  2018-01-11  9:36 ` Ross Lagerwall
  2018-01-11 15:48   ` Ross Lagerwall
@ 2018-01-11 15:48   ` Ross Lagerwall
  2018-01-16 18:47   ` Boris Ostrovsky
  2018-01-16 18:47   ` Boris Ostrovsky
  3 siblings, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11 15:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, netdev, Boris Ostrovsky, linux-kernel

+CC netdev

On 01/11/2018 09:36 AM, Ross Lagerwall wrote:
> The page given to gnttab_end_foreign_access() to free could be a
> compound page so use put_page() instead of free_page() since it can
> handle both compound and single pages correctly.
> 
> This bug was discovered when migrating a Xen VM with several VIFs and
> CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
> iterations. All netfront devices disconnect from the backend during a
> suspend/resume and this will call gnttab_end_foreign_access() if a
> netfront queue has an outstanding skb. The mismatch between calling
> get_page() and free_page() on a compound page causes a reference
> counting error which is detected when DEBUG_VM is enabled.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>   drivers/xen/grant-table.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index f45114f..27be107 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -382,7 +382,7 @@ static void gnttab_handle_deferred(struct timer_list *unused)
>   			if (entry->page) {
>   				pr_debug("freeing g.e. %#x (pfn %#lx)\n",
>   					 entry->ref, page_to_pfn(entry->page));
> -				__free_page(entry->page);
> +				put_page(entry->page);
>   			} else
>   				pr_info("freeing g.e. %#x\n", entry->ref);
>   			kfree(entry);
> @@ -438,7 +438,7 @@ void gnttab_end_foreign_access(grant_ref_t ref, int readonly,
>   	if (gnttab_end_foreign_access_ref(ref, readonly)) {
>   		put_free_entry(ref);
>   		if (page != 0)
> -			free_page(page);
> +			put_page(virt_to_page(page));
>   	} else
>   		gnttab_add_deferred(ref, readonly,
>   				    page ? virt_to_page(page) : NULL);
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11 15:26   ` David Miller
@ 2018-01-11 15:49     ` Ross Lagerwall
  2018-01-11 16:08       ` David Miller
  2018-01-11 16:08       ` David Miller
  2018-01-11 15:49     ` Ross Lagerwall
  1 sibling, 2 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11 15:49 UTC (permalink / raw)
  To: David Miller; +Cc: xen-devel, boris.ostrovsky, jgross, netdev, linux-kernel

On 01/11/2018 03:26 PM, David Miller wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> Date: Thu, 11 Jan 2018 09:36:38 +0000
> 
>> When a netfront device is set up it registers a netdev fairly early on,
>> before it has set up the queues and is actually usable. A userspace tool
>> like NetworkManager will immediately try to open it and access its state
>> as soon as it appears. The bug can be reproduced by hotplugging VIFs
>> until the VM runs out of grant refs. It registers the netdev but fails
>> to set up any queues (since there are no more grant refs). In the
>> meantime, NetworkManager opens the device and the kernel crashes trying
>> to access the queues (of which there are none).
>>
>> Fix this in two ways:
>> * For initial setup, register the netdev much later, after the queues
>> are setup. This avoids the race entirely.
>> * During a suspend/resume cycle, the frontend reconnects to the backend
>> and the queues are recreated. It is possible (though highly unlikely) to
>> race with something opening the device and accessing the queues after
>> they have been destroyed but before they have been recreated. Extend the
>> region covered by the rtnl semaphore to protect against this race. There
>> is a possibility that we fail to recreate the queues so check for this
>> in the open function.
>>
>> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Where is patch 1/2 and the 0/2 header posting which explains what this
> patch series is doing, how it is doing it, and why it is doing it that
> way?
> 

I've now added CC'd netdev on the other two.

Cheers,
-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11 15:26   ` David Miller
  2018-01-11 15:49     ` Ross Lagerwall
@ 2018-01-11 15:49     ` Ross Lagerwall
  1 sibling, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11 15:49 UTC (permalink / raw)
  To: David Miller; +Cc: jgross, xen-devel, boris.ostrovsky, linux-kernel, netdev

On 01/11/2018 03:26 PM, David Miller wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> Date: Thu, 11 Jan 2018 09:36:38 +0000
> 
>> When a netfront device is set up it registers a netdev fairly early on,
>> before it has set up the queues and is actually usable. A userspace tool
>> like NetworkManager will immediately try to open it and access its state
>> as soon as it appears. The bug can be reproduced by hotplugging VIFs
>> until the VM runs out of grant refs. It registers the netdev but fails
>> to set up any queues (since there are no more grant refs). In the
>> meantime, NetworkManager opens the device and the kernel crashes trying
>> to access the queues (of which there are none).
>>
>> Fix this in two ways:
>> * For initial setup, register the netdev much later, after the queues
>> are setup. This avoids the race entirely.
>> * During a suspend/resume cycle, the frontend reconnects to the backend
>> and the queues are recreated. It is possible (though highly unlikely) to
>> race with something opening the device and accessing the queues after
>> they have been destroyed but before they have been recreated. Extend the
>> region covered by the rtnl semaphore to protect against this race. There
>> is a possibility that we fail to recreate the queues so check for this
>> in the open function.
>>
>> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Where is patch 1/2 and the 0/2 header posting which explains what this
> patch series is doing, how it is doing it, and why it is doing it that
> way?
> 

I've now added CC'd netdev on the other two.

Cheers,
-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11 15:49     ` Ross Lagerwall
  2018-01-11 16:08       ` David Miller
@ 2018-01-11 16:08       ` David Miller
  2018-01-11 16:27         ` Ross Lagerwall
  2018-01-11 16:27         ` Ross Lagerwall
  1 sibling, 2 replies; 26+ messages in thread
From: David Miller @ 2018-01-11 16:08 UTC (permalink / raw)
  To: ross.lagerwall; +Cc: xen-devel, boris.ostrovsky, jgross, netdev, linux-kernel

From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Thu, 11 Jan 2018 15:49:07 +0000

> I've now added CC'd netdev on the other two.

That doesn't work.

If you don't post the originals explicitly to netdev then it won't
get properly queued in patchwork.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11 15:49     ` Ross Lagerwall
@ 2018-01-11 16:08       ` David Miller
  2018-01-11 16:08       ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2018-01-11 16:08 UTC (permalink / raw)
  To: ross.lagerwall; +Cc: jgross, xen-devel, boris.ostrovsky, linux-kernel, netdev

From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Thu, 11 Jan 2018 15:49:07 +0000

> I've now added CC'd netdev on the other two.

That doesn't work.

If you don't post the originals explicitly to netdev then it won't
get properly queued in patchwork.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11 16:08       ` David Miller
  2018-01-11 16:27         ` Ross Lagerwall
@ 2018-01-11 16:27         ` Ross Lagerwall
  2018-01-11 16:36           ` David Miller
  2018-01-11 16:36           ` David Miller
  1 sibling, 2 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11 16:27 UTC (permalink / raw)
  To: David Miller; +Cc: xen-devel, boris.ostrovsky, jgross, netdev, linux-kernel

On 01/11/2018 04:08 PM, David Miller wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> Date: Thu, 11 Jan 2018 15:49:07 +0000
> 
>> I've now added CC'd netdev on the other two.
> 
> That doesn't work.
> 
> If you don't post the originals explicitly to netdev then it won't
> get properly queued in patchwork.
> 

The series fixes two crashes when using netfront. The first is actually 
fixed in Xen common code, not netfront, which is why I only CC'd netdev 
on the second. I can resend the originals to netdev if you want but IMO 
it is not necessary.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11 16:08       ` David Miller
@ 2018-01-11 16:27         ` Ross Lagerwall
  2018-01-11 16:27         ` Ross Lagerwall
  1 sibling, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11 16:27 UTC (permalink / raw)
  To: David Miller; +Cc: jgross, xen-devel, boris.ostrovsky, linux-kernel, netdev

On 01/11/2018 04:08 PM, David Miller wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> Date: Thu, 11 Jan 2018 15:49:07 +0000
> 
>> I've now added CC'd netdev on the other two.
> 
> That doesn't work.
> 
> If you don't post the originals explicitly to netdev then it won't
> get properly queued in patchwork.
> 

The series fixes two crashes when using netfront. The first is actually 
fixed in Xen common code, not netfront, which is why I only CC'd netdev 
on the second. I can resend the originals to netdev if you want but IMO 
it is not necessary.

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11 16:27         ` Ross Lagerwall
  2018-01-11 16:36           ` David Miller
@ 2018-01-11 16:36           ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2018-01-11 16:36 UTC (permalink / raw)
  To: ross.lagerwall; +Cc: xen-devel, boris.ostrovsky, jgross, netdev, linux-kernel

From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Thu, 11 Jan 2018 16:27:20 +0000

> On 01/11/2018 04:08 PM, David Miller wrote:
>> From: Ross Lagerwall <ross.lagerwall@citrix.com>
>> Date: Thu, 11 Jan 2018 15:49:07 +0000
>> 
>>> I've now added CC'd netdev on the other two.
>> That doesn't work.
>> If you don't post the originals explicitly to netdev then it won't
>> get properly queued in patchwork.
>> 
> 
> The series fixes two crashes when using netfront. The first is
> actually fixed in Xen common code, not netfront, which is why I only
> CC'd netdev on the second. I can resend the originals to netdev if you
> want but IMO it is not necessary.

Everyone who reviews the networking side will appreciate the full
context of the patch series so they can review the networking part
in proper context.

And if it is decided that these changes should go via my GIT tree
wouldn't you want it to be all setup properly for that already without
having to do anything more on your part?

I don't see what the resistence is to giving people more information
and allowing more flexibility for integrating and reviewing your
work.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11 16:27         ` Ross Lagerwall
@ 2018-01-11 16:36           ` David Miller
  2018-01-11 16:36           ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2018-01-11 16:36 UTC (permalink / raw)
  To: ross.lagerwall; +Cc: jgross, xen-devel, boris.ostrovsky, linux-kernel, netdev

From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Thu, 11 Jan 2018 16:27:20 +0000

> On 01/11/2018 04:08 PM, David Miller wrote:
>> From: Ross Lagerwall <ross.lagerwall@citrix.com>
>> Date: Thu, 11 Jan 2018 15:49:07 +0000
>> 
>>> I've now added CC'd netdev on the other two.
>> That doesn't work.
>> If you don't post the originals explicitly to netdev then it won't
>> get properly queued in patchwork.
>> 
> 
> The series fixes two crashes when using netfront. The first is
> actually fixed in Xen common code, not netfront, which is why I only
> CC'd netdev on the second. I can resend the originals to netdev if you
> want but IMO it is not necessary.

Everyone who reviews the networking side will appreciate the full
context of the patch series so they can review the networking part
in proper context.

And if it is decided that these changes should go via my GIT tree
wouldn't you want it to be all setup properly for that already without
having to do anything more on your part?

I don't see what the resistence is to giving people more information
and allowing more flexibility for integrating and reviewing your
work.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/grant-table: Use put_page instead of free_page
  2018-01-11  9:36 ` Ross Lagerwall
                     ` (2 preceding siblings ...)
  2018-01-16 18:47   ` Boris Ostrovsky
@ 2018-01-16 18:47   ` Boris Ostrovsky
  3 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2018-01-16 18:47 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel; +Cc: Juergen Gross, linux-kernel

On 01/11/2018 04:36 AM, Ross Lagerwall wrote:
> The page given to gnttab_end_foreign_access() to free could be a
> compound page so use put_page() instead of free_page() since it can
> handle both compound and single pages correctly.
>
> This bug was discovered when migrating a Xen VM with several VIFs and
> CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
> iterations. All netfront devices disconnect from the backend during a
> suspend/resume and this will call gnttab_end_foreign_access() if a
> netfront queue has an outstanding skb. The mismatch between calling
> get_page() and free_page() on a compound page causes a reference
> counting error which is detected when DEBUG_VM is enabled.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] xen/grant-table: Use put_page instead of free_page
  2018-01-11  9:36 ` Ross Lagerwall
  2018-01-11 15:48   ` Ross Lagerwall
  2018-01-11 15:48   ` Ross Lagerwall
@ 2018-01-16 18:47   ` Boris Ostrovsky
  2018-01-16 18:47   ` Boris Ostrovsky
  3 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2018-01-16 18:47 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel; +Cc: Juergen Gross, linux-kernel

On 01/11/2018 04:36 AM, Ross Lagerwall wrote:
> The page given to gnttab_end_foreign_access() to free could be a
> compound page so use put_page() instead of free_page() since it can
> handle both compound and single pages correctly.
>
> This bug was discovered when migrating a Xen VM with several VIFs and
> CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
> iterations. All netfront devices disconnect from the backend during a
> suspend/resume and this will call gnttab_end_foreign_access() if a
> netfront queue has an outstanding skb. The mismatch between calling
> get_page() and free_page() on a compound page causes a reference
> counting error which is detected when DEBUG_VM is enabled.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11  9:36 ` [PATCH 2/2] xen-netfront: Fix race between device setup and open Ross Lagerwall
  2018-01-11 15:26   ` David Miller
  2018-01-11 15:26   ` David Miller
@ 2018-01-16 18:56   ` Boris Ostrovsky
  2018-01-16 18:56   ` Boris Ostrovsky
  3 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2018-01-16 18:56 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel; +Cc: Juergen Gross, netdev, linux-kernel

On 01/11/2018 04:36 AM, Ross Lagerwall wrote:
> When a netfront device is set up it registers a netdev fairly early on,
> before it has set up the queues and is actually usable. A userspace tool
> like NetworkManager will immediately try to open it and access its state
> as soon as it appears. The bug can be reproduced by hotplugging VIFs
> until the VM runs out of grant refs. It registers the netdev but fails
> to set up any queues (since there are no more grant refs). In the
> meantime, NetworkManager opens the device and the kernel crashes trying
> to access the queues (of which there are none).
>
> Fix this in two ways:
> * For initial setup, register the netdev much later, after the queues
> are setup. This avoids the race entirely.
> * During a suspend/resume cycle, the frontend reconnects to the backend
> and the queues are recreated. It is possible (though highly unlikely) to
> race with something opening the device and accessing the queues after
> they have been destroyed but before they have been recreated. Extend the
> region covered by the rtnl semaphore to protect against this race. There
> is a possibility that we fail to recreate the queues so check for this
> in the open function.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
  2018-01-11  9:36 ` [PATCH 2/2] xen-netfront: Fix race between device setup and open Ross Lagerwall
                     ` (2 preceding siblings ...)
  2018-01-16 18:56   ` Boris Ostrovsky
@ 2018-01-16 18:56   ` Boris Ostrovsky
  3 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2018-01-16 18:56 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel; +Cc: Juergen Gross, netdev, linux-kernel

On 01/11/2018 04:36 AM, Ross Lagerwall wrote:
> When a netfront device is set up it registers a netdev fairly early on,
> before it has set up the queues and is actually usable. A userspace tool
> like NetworkManager will immediately try to open it and access its state
> as soon as it appears. The bug can be reproduced by hotplugging VIFs
> until the VM runs out of grant refs. It registers the netdev but fails
> to set up any queues (since there are no more grant refs). In the
> meantime, NetworkManager opens the device and the kernel crashes trying
> to access the queues (of which there are none).
>
> Fix this in two ways:
> * For initial setup, register the netdev much later, after the queues
> are setup. This avoids the race entirely.
> * During a suspend/resume cycle, the frontend reconnects to the backend
> and the queues are recreated. It is possible (though highly unlikely) to
> race with something opening the device and accessing the queues after
> they have been destroyed but before they have been recreated. Extend the
> region covered by the rtnl semaphore to protect against this race. There
> is a possibility that we fail to recreate the queues so check for this
> in the open function.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/2] Fix a couple of crashes in netfront
  2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
                   ` (5 preceding siblings ...)
  2018-01-11 15:48 ` Ross Lagerwall
@ 2018-02-06 14:52 ` Juergen Gross
  2018-02-06 14:52 ` Juergen Gross
  7 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2018-02-06 14:52 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel; +Cc: Boris Ostrovsky, linux-kernel

On 11/01/18 10:36, Ross Lagerwall wrote:
> Here are a couple of patches to fix two crashes in netfront.
> 
> Ross Lagerwall (2):
>   xen/grant-table: Use put_page instead of free_page
>   xen-netfront: Fix race between device setup and open
> 
>  drivers/net/xen-netfront.c | 46 ++++++++++++++++++++++++----------------------
>  drivers/xen/grant-table.c  |  4 ++--
>  2 files changed, 26 insertions(+), 24 deletions(-)
> 

Both committed to xen.tip for-linus-4.16


Juergen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/2] Fix a couple of crashes in netfront
  2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
                   ` (6 preceding siblings ...)
  2018-02-06 14:52 ` Juergen Gross
@ 2018-02-06 14:52 ` Juergen Gross
  7 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2018-02-06 14:52 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel; +Cc: Boris Ostrovsky, linux-kernel

On 11/01/18 10:36, Ross Lagerwall wrote:
> Here are a couple of patches to fix two crashes in netfront.
> 
> Ross Lagerwall (2):
>   xen/grant-table: Use put_page instead of free_page
>   xen-netfront: Fix race between device setup and open
> 
>  drivers/net/xen-netfront.c | 46 ++++++++++++++++++++++++----------------------
>  drivers/xen/grant-table.c  |  4 ++--
>  2 files changed, 26 insertions(+), 24 deletions(-)
> 

Both committed to xen.tip for-linus-4.16


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/2] Fix a couple of crashes in netfront
@ 2018-01-11  9:36 Ross Lagerwall
  0 siblings, 0 replies; 26+ messages in thread
From: Ross Lagerwall @ 2018-01-11  9:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Ross Lagerwall, Boris Ostrovsky, linux-kernel

Here are a couple of patches to fix two crashes in netfront.

Ross Lagerwall (2):
  xen/grant-table: Use put_page instead of free_page
  xen-netfront: Fix race between device setup and open

 drivers/net/xen-netfront.c | 46 ++++++++++++++++++++++++----------------------
 drivers/xen/grant-table.c  |  4 ++--
 2 files changed, 26 insertions(+), 24 deletions(-)

-- 
2.9.5


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-02-06 14:53 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-11  9:36 [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
2018-01-11  9:36 ` [PATCH 1/2] xen/grant-table: Use put_page instead of free_page Ross Lagerwall
2018-01-11  9:36 ` Ross Lagerwall
2018-01-11 15:48   ` Ross Lagerwall
2018-01-11 15:48   ` Ross Lagerwall
2018-01-16 18:47   ` Boris Ostrovsky
2018-01-16 18:47   ` Boris Ostrovsky
2018-01-11  9:36 ` [PATCH 2/2] xen-netfront: Fix race between device setup and open Ross Lagerwall
2018-01-11 15:26   ` David Miller
2018-01-11 15:49     ` Ross Lagerwall
2018-01-11 16:08       ` David Miller
2018-01-11 16:08       ` David Miller
2018-01-11 16:27         ` Ross Lagerwall
2018-01-11 16:27         ` Ross Lagerwall
2018-01-11 16:36           ` David Miller
2018-01-11 16:36           ` David Miller
2018-01-11 15:49     ` Ross Lagerwall
2018-01-11 15:26   ` David Miller
2018-01-16 18:56   ` Boris Ostrovsky
2018-01-16 18:56   ` Boris Ostrovsky
2018-01-11  9:36 ` Ross Lagerwall
2018-01-11 15:48 ` [PATCH 0/2] Fix a couple of crashes in netfront Ross Lagerwall
2018-01-11 15:48 ` Ross Lagerwall
2018-02-06 14:52 ` Juergen Gross
2018-02-06 14:52 ` Juergen Gross
2018-01-11  9:36 Ross Lagerwall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.