linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
@ 2017-05-04 12:23 Vitaly Kuznetsov
  2017-05-04 15:21 ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-04 12:23 UTC (permalink / raw)
  To: xen-devel; +Cc: netdev, linux-kernel, Boris Ostrovsky, Juergen Gross

Unavoidable crashes in netfront_resume() and netback_changed() after a
previous fail in talk_to_netback() (e.g. when we fail to read MAC from
xenstore) were discovered. The failure path in talk_to_netback() does
unregister/free for netdev but we don't reset drvdata and we try accessing
it again after resume.

Reset drvdata in netback_changed() the same way we reset it in
netfront_probe() and check for NULL in both netfront_resume() and
netback_changed() to properly handle the situation.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
I apologize for sending this during the merge window, I'm not sure if this
needs to go through xen or netdev tree. I think the fix is simple enough
and it would make sense to squeeze it in 4.12 if possible. Thanks.
---
 drivers/net/xen-netfront.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 6ffc482..92f746c 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1426,6 +1426,9 @@ static int netfront_resume(struct xenbus_device *dev)
 {
 	struct netfront_info *info = dev_get_drvdata(&dev->dev);
 
+	if (!info)
+		return 0;
+
 	dev_dbg(&dev->dev, "%s\n", dev->nodename);
 
 	xennet_disconnect_backend(info);
@@ -1936,6 +1939,7 @@ static int talk_to_netback(struct xenbus_device *dev,
  out:
 	unregister_netdev(info->netdev);
 	xennet_free_netdev(info->netdev);
+	dev_set_drvdata(&dev->dev, NULL);
 	return err;
 }
 
@@ -1997,7 +2001,12 @@ static void netback_changed(struct xenbus_device *dev,
 			    enum xenbus_state backend_state)
 {
 	struct netfront_info *np = dev_get_drvdata(&dev->dev);
-	struct net_device *netdev = np->netdev;
+	struct net_device *netdev;
+
+	if (!np)
+		return;
+
+	netdev = np->netdev;
 
 	dev_dbg(&dev->dev, "%s\n", xenbus_strstate(backend_state));
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
  2017-05-04 12:23 [PATCH] xen-netfront: avoid crashing on resume after a failure in talk_to_netback() Vitaly Kuznetsov
@ 2017-05-04 15:21 ` David Miller
  2017-05-05 14:40   ` Vitaly Kuznetsov
  0 siblings, 1 reply; 3+ messages in thread
From: David Miller @ 2017-05-04 15:21 UTC (permalink / raw)
  To: vkuznets; +Cc: xen-devel, netdev, linux-kernel, boris.ostrovsky, jgross

From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu,  4 May 2017 14:23:04 +0200

> Unavoidable crashes in netfront_resume() and netback_changed() after a
> previous fail in talk_to_netback() (e.g. when we fail to read MAC from
> xenstore) were discovered. The failure path in talk_to_netback() does
> unregister/free for netdev but we don't reset drvdata and we try accessing
> it again after resume.
> 
> Reset drvdata in netback_changed() the same way we reset it in
> netfront_probe() and check for NULL in both netfront_resume() and
> netback_changed() to properly handle the situation.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

The circumstances under which netfront_probe() NULLs out the device
private is different than what you propose here, which is to do it
on a live device in netback_changed() whilst mutliple susbsytems
have a reference to this device and can call into the driver still.

It is only legal to do this in the probe function because such
references and execution possibilities do not exist at that point.

What really needs to happen is that the xenbus_driver must be told to
unregister this xen device and stop making calls into the driver for
it before you release the netdev state.

That is the only reasonable way to fix this bug.

Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
  2017-05-04 15:21 ` David Miller
@ 2017-05-05 14:40   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 3+ messages in thread
From: Vitaly Kuznetsov @ 2017-05-05 14:40 UTC (permalink / raw)
  To: David Miller; +Cc: xen-devel, netdev, linux-kernel, boris.ostrovsky, jgross

David Miller <davem@davemloft.net> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Date: Thu,  4 May 2017 14:23:04 +0200
>
>> Unavoidable crashes in netfront_resume() and netback_changed() after a
>> previous fail in talk_to_netback() (e.g. when we fail to read MAC from
>> xenstore) were discovered. The failure path in talk_to_netback() does
>> unregister/free for netdev but we don't reset drvdata and we try accessing
>> it again after resume.
>> 
>> Reset drvdata in netback_changed() the same way we reset it in
>> netfront_probe() and check for NULL in both netfront_resume() and
>> netback_changed() to properly handle the situation.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> The circumstances under which netfront_probe() NULLs out the device
> private is different than what you propose here, which is to do it
> on a live device in netback_changed() whilst mutliple susbsytems
> have a reference to this device and can call into the driver still.
>
> It is only legal to do this in the probe function because such
> references and execution possibilities do not exist at that point.
>
> What really needs to happen is that the xenbus_driver must be told to
> unregister this xen device and stop making calls into the driver for
> it before you release the netdev state.
>
> That is the only reasonable way to fix this bug.

True,

after looking at the issue again I realized that removing half of the
device in talk_to_netback() is a mistake - we should either treat errors
as fatal and remove the device completely or leave netdev in place
hoping that it'll magically got fixed later. I'm leaning towards the
former, I tried and the following simple patch does the job:

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 6ffc482..7b61adb 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1934,8 +1934,7 @@ static int talk_to_netback(struct xenbus_device *dev,
        xennet_disconnect_backend(info);
        xennet_destroy_queues(info);
  out:
-       unregister_netdev(info->netdev);
-       xennet_free_netdev(info->netdev);
+       device_unregister(&dev->dev);
        return err;
 }

In case noone is against this big hammer I can send this as v2.

Thank you for your feedback, David!

-- 
  Vitaly

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-05 14:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04 12:23 [PATCH] xen-netfront: avoid crashing on resume after a failure in talk_to_netback() Vitaly Kuznetsov
2017-05-04 15:21 ` David Miller
2017-05-05 14:40   ` Vitaly Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).