linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] virtio_net: enable tx after resuming from suspend
@ 2018-10-11  7:51 Ake Koomsin
  2018-10-11  9:44 ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Ake Koomsin @ 2018-10-11  7:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: ake, Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel

commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
disabled the virtio tx before going to suspend to avoid a use after free.
However, after resuming, it causes the virtio_net device to lose its
network connectivity.

To solve the issue, we need to enable tx after resuming.

Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
---
 drivers/net/virtio_net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dab504ec5e50..3453d80f5f81 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 	}
 
 	netif_device_attach(vi->dev);
+	netif_start_queue(vi->dev);
 	return err;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-11  7:51 [PATCH] virtio_net: enable tx after resuming from suspend Ake Koomsin
@ 2018-10-11  9:44 ` Jason Wang
  2018-10-11 10:22   ` ake
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2018-10-11  9:44 UTC (permalink / raw)
  To: Ake Koomsin
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel



On 2018年10月11日 15:51, Ake Koomsin wrote:
> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> disabled the virtio tx before going to suspend to avoid a use after free.
> However, after resuming, it causes the virtio_net device to lose its
> network connectivity.
>
> To solve the issue, we need to enable tx after resuming.
>
> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
> ---
>   drivers/net/virtio_net.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index dab504ec5e50..3453d80f5f81 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>   	}
>   
>   	netif_device_attach(vi->dev);
> +	netif_start_queue(vi->dev);

I believe this is duplicated with netif_tx_wake_all_queues() in 
netif_device_attach() above?

Thanks

>   	return err;
>   }
>   


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-11  9:44 ` Jason Wang
@ 2018-10-11 10:22   ` ake
  2018-10-11 13:06     ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: ake @ 2018-10-11 10:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel



On 2018年10月11日 18:44, Jason Wang wrote:
> 
> 
> On 2018年10月11日 15:51, Ake Koomsin wrote:
>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>> disabled the virtio tx before going to suspend to avoid a use after free.
>> However, after resuming, it causes the virtio_net device to lose its
>> network connectivity.
>>
>> To solve the issue, we need to enable tx after resuming.
>>
>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>> reset")
>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>> ---
>>   drivers/net/virtio_net.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index dab504ec5e50..3453d80f5f81 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>> virtio_device *vdev)
>>       }
>>         netif_device_attach(vi->dev);
>> +    netif_start_queue(vi->dev);
> 
> I believe this is duplicated with netif_tx_wake_all_queues() in
> netif_device_attach() above?

Thank you for your review.

If both netif_tx_wake_all_queues() and netif_start_queue() result in
clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
conditions in netif_device_attach() is not satisfied? Without
netif_start_queue(), the virtio_net device does not resume properly
after waking up.

Is it better to report this as a bug first? If I am to do more
investigation, what areas should I look into?


Best Regards
Ake Koomsin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-11 10:22   ` ake
@ 2018-10-11 13:06     ` Jason Wang
  2018-10-12  4:30       ` ake
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2018-10-11 13:06 UTC (permalink / raw)
  To: ake
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel



On 2018年10月11日 18:22, ake wrote:
>
> On 2018年10月11日 18:44, Jason Wang wrote:
>>
>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>> disabled the virtio tx before going to suspend to avoid a use after free.
>>> However, after resuming, it causes the virtio_net device to lose its
>>> network connectivity.
>>>
>>> To solve the issue, we need to enable tx after resuming.
>>>
>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>> reset")
>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>> ---
>>>    drivers/net/virtio_net.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index dab504ec5e50..3453d80f5f81 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>> virtio_device *vdev)
>>>        }
>>>          netif_device_attach(vi->dev);
>>> +    netif_start_queue(vi->dev);
>> I believe this is duplicated with netif_tx_wake_all_queues() in
>> netif_device_attach() above?
> Thank you for your review.
>
> If both netif_tx_wake_all_queues() and netif_start_queue() result in
> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
> conditions in netif_device_attach() is not satisfied?

Yes, maybe. One case I can see now is when the device is down, in this 
case netif_device_attach() won't try to wakeup the queue.

>   Without
> netif_start_queue(), the virtio_net device does not resume properly
> after waking up.

How do you trigger the issue? Just do suspend/resume?

>
> Is it better to report this as a bug first?

Nope, you're very welcome to post patch directly.

> If I am to do more
> investigation, what areas should I look into?

As you've figured out, you can start with why netif_tx_wake_all_queues() 
were not executed?

(Btw, does the issue disappear if you move netif_tx_disable() under the 
check of netif_running() in virtnet_freeze_down()?)

Thanks

>
> Best Regards
> Ake Koomsin
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-11 13:06     ` Jason Wang
@ 2018-10-12  4:30       ` ake
  2018-10-12  8:23         ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: ake @ 2018-10-12  4:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel



On 2018年10月11日 22:06, Jason Wang wrote:
> 
> 
> On 2018年10月11日 18:22, ake wrote:
>>
>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>
>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>> free.
>>>> However, after resuming, it causes the virtio_net device to lose its
>>>> network connectivity.
>>>>
>>>> To solve the issue, we need to enable tx after resuming.
>>>>
>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>> reset")
>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>> ---
>>>>    drivers/net/virtio_net.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index dab504ec5e50..3453d80f5f81 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>> virtio_device *vdev)
>>>>        }
>>>>          netif_device_attach(vi->dev);
>>>> +    netif_start_queue(vi->dev);
>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>> netif_device_attach() above?
>> Thank you for your review.
>>
>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>> conditions in netif_device_attach() is not satisfied?
> 
> Yes, maybe. One case I can see now is when the device is down, in this
> case netif_device_attach() won't try to wakeup the queue.
> 
>>   Without
>> netif_start_queue(), the virtio_net device does not resume properly
>> after waking up.
> 
> How do you trigger the issue? Just do suspend/resume?

Yes, simply suspend and resume.

Here is how I trigger the issue:

1) Start the Virtual Machine Manager GUI program.
2) Create a guest Linux OS. Make sure that the guest OS kernel is
   >= 4.12. Make sure that it uses virtio_net as its network device.
   In addition, make sure that the video adapter is VGA. Otherwise,
   waking up with the virtual power button does not work.
3) After installing the guest OS, log in, and test the network
   connectivity by ping the host machine.
4) Suspend. After this, the screen is blank.
5) Resume by hitting the virtual power button. The login screen
   appears again.
6) Log in again. The guest loses its network connection.

In my test:
Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic

>>
>> Is it better to report this as a bug first?
> 
> Nope, you're very welcome to post patch directly.
> 
>> If I am to do more
>> investigation, what areas should I look into?
> 
> As you've figured out, you can start with why netif_tx_wake_all_queues()
> were not executed?
> 
> (Btw, does the issue disappear if you move netif_tx_disable() under the
> check of netif_running() in virtnet_freeze_down()?)

The issue disappears if I move netif_tx_disable() under the check of
netif_running() in virtnet_freeze_down(). Moving netif_tx_disable()
is probably better as its logic is consistent with
netif_device_attach() implementation. If you are OK with this idea,
I will submit another patch.

> Thanks
> 
>>
>> Best Regards
>> Ake Koomsin
>>
> 

Best Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-12  4:30       ` ake
@ 2018-10-12  8:23         ` Jason Wang
  2018-10-12  9:18           ` ake
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2018-10-12  8:23 UTC (permalink / raw)
  To: ake
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel



On 2018年10月12日 12:30, ake wrote:
>
> On 2018年10月11日 22:06, Jason Wang wrote:
>>
>> On 2018年10月11日 18:22, ake wrote:
>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>>> free.
>>>>> However, after resuming, it causes the virtio_net device to lose its
>>>>> network connectivity.
>>>>>
>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>
>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>> reset")
>>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>>> ---
>>>>>     drivers/net/virtio_net.c | 1 +
>>>>>     1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>> virtio_device *vdev)
>>>>>         }
>>>>>           netif_device_attach(vi->dev);
>>>>> +    netif_start_queue(vi->dev);
>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>> netif_device_attach() above?
>>> Thank you for your review.
>>>
>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>> conditions in netif_device_attach() is not satisfied?
>> Yes, maybe. One case I can see now is when the device is down, in this
>> case netif_device_attach() won't try to wakeup the queue.
>>
>>>    Without
>>> netif_start_queue(), the virtio_net device does not resume properly
>>> after waking up.
>> How do you trigger the issue? Just do suspend/resume?
> Yes, simply suspend and resume.
>
> Here is how I trigger the issue:
>
> 1) Start the Virtual Machine Manager GUI program.
> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>     >= 4.12. Make sure that it uses virtio_net as its network device.
>     In addition, make sure that the video adapter is VGA. Otherwise,
>     waking up with the virtual power button does not work.
> 3) After installing the guest OS, log in, and test the network
>     connectivity by ping the host machine.
> 4) Suspend. After this, the screen is blank.
> 5) Resume by hitting the virtual power button. The login screen
>     appears again.
> 6) Log in again. The guest loses its network connection.
>
> In my test:
> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic

I can not reproduce this issue if virtio-net interface is up in guest 
before the suspend. I'm using net-next.git and qemu master. But I do 
reproduce when virtio-net interface is down in guest before suspend, 
after resume, even if I make it up, the network is still lost.

I think the interface is up in your case, but please confirm this.

>
>>> Is it better to report this as a bug first?
>> Nope, you're very welcome to post patch directly.
>>
>>> If I am to do more
>>> investigation, what areas should I look into?
>> As you've figured out, you can start with why netif_tx_wake_all_queues()
>> were not executed?
>>
>> (Btw, does the issue disappear if you move netif_tx_disable() under the
>> check of netif_running() in virtnet_freeze_down()?)
> The issue disappears if I move netif_tx_disable() under the check of
> netif_running() in virtnet_freeze_down(). Moving netif_tx_disable()
> is probably better as its logic is consistent with
> netif_device_attach() implementation. If you are OK with this idea,
> I will submit another patch.

I think the it helps for the case when interface is down before suspend. 
But it's still unclear why it help even if the interface is up 
(netif_running() is true).

Please submit a patch but we should figure out why it help for a up 
interface as well.

Thanks

>
>> Thanks
>>
>>> Best Regards
>>> Ake Koomsin
>>>
> Best Regards


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-12  8:23         ` Jason Wang
@ 2018-10-12  9:18           ` ake
  2018-10-15 10:08             ` ake
  0 siblings, 1 reply; 18+ messages in thread
From: ake @ 2018-10-12  9:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel



On 2018年10月12日 17:23, Jason Wang wrote:
> 
> 
> On 2018年10月12日 12:30, ake wrote:
>>
>> On 2018年10月11日 22:06, Jason Wang wrote:
>>>
>>> On 2018年10月11日 18:22, ake wrote:
>>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>>>> free.
>>>>>> However, after resuming, it causes the virtio_net device to lose its
>>>>>> network connectivity.
>>>>>>
>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>
>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>> reset")
>>>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>>>> ---
>>>>>>     drivers/net/virtio_net.c | 1 +
>>>>>>     1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>> --- a/drivers/net/virtio_net.c
>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>> virtio_device *vdev)
>>>>>>         }
>>>>>>           netif_device_attach(vi->dev);
>>>>>> +    netif_start_queue(vi->dev);
>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>> netif_device_attach() above?
>>>> Thank you for your review.
>>>>
>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>> conditions in netif_device_attach() is not satisfied?
>>> Yes, maybe. One case I can see now is when the device is down, in this
>>> case netif_device_attach() won't try to wakeup the queue.
>>>
>>>>    Without
>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>> after waking up.
>>> How do you trigger the issue? Just do suspend/resume?
>> Yes, simply suspend and resume.
>>
>> Here is how I trigger the issue:
>>
>> 1) Start the Virtual Machine Manager GUI program.
>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>     >= 4.12. Make sure that it uses virtio_net as its network device.
>>     In addition, make sure that the video adapter is VGA. Otherwise,
>>     waking up with the virtual power button does not work.
>> 3) After installing the guest OS, log in, and test the network
>>     connectivity by ping the host machine.
>> 4) Suspend. After this, the screen is blank.
>> 5) Resume by hitting the virtual power button. The login screen
>>     appears again.
>> 6) Log in again. The guest loses its network connection.
>>
>> In my test:
>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
> 
> I can not reproduce this issue if virtio-net interface is up in guest
> before the suspend. I'm using net-next.git and qemu master. But I do
> reproduce when virtio-net interface is down in guest before suspend,
> after resume, even if I make it up, the network is still lost.
> 
> I think the interface is up in your case, but please confirm this.

If you mean the interface state before I hit the suspend button,
the answer is yes. The interface is up before I suspend the guest
machine.

Note that my current QEMU version is QEMU emulator version 2.5.0
(Debian 1:2.5+dfsg-5ubuntu10.32).

I will try with net-next.git and qemu master later and see if I can
reproduce the issue.

>>
>>>> Is it better to report this as a bug first?
>>> Nope, you're very welcome to post patch directly.
>>>
>>>> If I am to do more
>>>> investigation, what areas should I look into?
>>> As you've figured out, you can start with why netif_tx_wake_all_queues()
>>> were not executed?
>>>
>>> (Btw, does the issue disappear if you move netif_tx_disable() under the
>>> check of netif_running() in virtnet_freeze_down()?)
>> The issue disappears if I move netif_tx_disable() under the check of
>> netif_running() in virtnet_freeze_down(). Moving netif_tx_disable()
>> is probably better as its logic is consistent with
>> netif_device_attach() implementation. If you are OK with this idea,
>> I will submit another patch.
> 
> I think the it helps for the case when interface is down before suspend.
> But it's still unclear why it help even if the interface is up
> (netif_running() is true).
> 
> Please submit a patch but we should figure out why it help for a up
> interface as well.
> 
> Thanks
> 
>>
>>> Thanks
>>>
>>>> Best Regards
>>>> Ake Koomsin
>>>>
>> Best Regards
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-12  9:18           ` ake
@ 2018-10-15 10:08             ` ake
  2018-10-16  8:53               ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: ake @ 2018-10-15 10:08 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel



On 2018年10月12日 18:18, ake wrote:
> 
> 
> On 2018年10月12日 17:23, Jason Wang wrote:
>>
>>
>> On 2018年10月12日 12:30, ake wrote:
>>>
>>> On 2018年10月11日 22:06, Jason Wang wrote:
>>>>
>>>> On 2018年10月11日 18:22, ake wrote:
>>>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>>>>> free.
>>>>>>> However, after resuming, it causes the virtio_net device to lose its
>>>>>>> network connectivity.
>>>>>>>
>>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>>
>>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>>> reset")
>>>>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>>>>> ---
>>>>>>>     drivers/net/virtio_net.c | 1 +
>>>>>>>     1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>>> virtio_device *vdev)
>>>>>>>         }
>>>>>>>           netif_device_attach(vi->dev);
>>>>>>> +    netif_start_queue(vi->dev);
>>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>>> netif_device_attach() above?
>>>>> Thank you for your review.
>>>>>
>>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>>> conditions in netif_device_attach() is not satisfied?
>>>> Yes, maybe. One case I can see now is when the device is down, in this
>>>> case netif_device_attach() won't try to wakeup the queue.
>>>>
>>>>>    Without
>>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>>> after waking up.
>>>> How do you trigger the issue? Just do suspend/resume?
>>> Yes, simply suspend and resume.
>>>
>>> Here is how I trigger the issue:
>>>
>>> 1) Start the Virtual Machine Manager GUI program.
>>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>>     >= 4.12. Make sure that it uses virtio_net as its network device.
>>>     In addition, make sure that the video adapter is VGA. Otherwise,
>>>     waking up with the virtual power button does not work.
>>> 3) After installing the guest OS, log in, and test the network
>>>     connectivity by ping the host machine.
>>> 4) Suspend. After this, the screen is blank.
>>> 5) Resume by hitting the virtual power button. The login screen
>>>     appears again.
>>> 6) Log in again. The guest loses its network connection.
>>>
>>> In my test:
>>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>>
>> I can not reproduce this issue if virtio-net interface is up in guest
>> before the suspend. I'm using net-next.git and qemu master. But I do
>> reproduce when virtio-net interface is down in guest before suspend,
>> after resume, even if I make it up, the network is still lost.
>>
>> I think the interface is up in your case, but please confirm this.
> 
> If you mean the interface state before I hit the suspend button,
> the answer is yes. The interface is up before I suspend the guest
> machine.
> 
> Note that my current QEMU version is QEMU emulator version 2.5.0
> (Debian 1:2.5+dfsg-5ubuntu10.32).
> 
> I will try with net-next.git and qemu master later and see if I can
> reproduce the issue.
Update. I tried with net-next and qemu master. Interestingly, the result
is different from yours. The network is lost even if the virtio_net
interface is up before suspending.

Host: Ubuntu 16.04 with net-next kernel (default configuration)
Guest: Ubuntu 18.04 with net-next kernel (default configuration)
Qemu: master
Qemu command:
qemu-system-x86_64 -cpu host -m 2048 -enable-kvm \
-bios /usr/share/OVMF/OVMF_CODE.fd \
-drive file=/var/lib/libvirt/images/virtio_test.qcow2,if=virtio \
-netdev user,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0 \
-device VGA,id=video0,vgamem_mb=16 \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 -monitor stdio

>>>
>>>>> Is it better to report this as a bug first?
>>>> Nope, you're very welcome to post patch directly.
>>>>
>>>>> If I am to do more
>>>>> investigation, what areas should I look into?
>>>> As you've figured out, you can start with why netif_tx_wake_all_queues()
>>>> were not executed?
>>>>
>>>> (Btw, does the issue disappear if you move netif_tx_disable() under the
>>>> check of netif_running() in virtnet_freeze_down()?)
>>> The issue disappears if I move netif_tx_disable() under the check of
>>> netif_running() in virtnet_freeze_down(). Moving netif_tx_disable()
>>> is probably better as its logic is consistent with
>>> netif_device_attach() implementation. If you are OK with this idea,
>>> I will submit another patch.
>>
>> I think the it helps for the case when interface is down before suspend.
>> But it's still unclear why it help even if the interface is up
>> (netif_running() is true).
>>
>> Please submit a patch but we should figure out why it help for a up
>> interface as well.
>>

I will think about the proper reason first.

>> Thanks
>>
>>>
>>>> Thanks
>>>>
>>>>> Best Regards
>>>>> Ake Koomsin
>>>>>
>>> Best Regards
>>

Best Regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-15 10:08             ` ake
@ 2018-10-16  8:53               ` Jason Wang
  2018-10-16 10:15                 ` ake
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2018-10-16  8:53 UTC (permalink / raw)
  To: ake
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel


On 2018/10/15 下午6:08, ake wrote:
>
> On 2018年10月12日 18:18, ake wrote:
>>
>> On 2018年10月12日 17:23, Jason Wang wrote:
>>>
>>> On 2018年10月12日 12:30, ake wrote:
>>>> On 2018年10月11日 22:06, Jason Wang wrote:
>>>>> On 2018年10月11日 18:22, ake wrote:
>>>>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>>>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>>>>>> free.
>>>>>>>> However, after resuming, it causes the virtio_net device to lose its
>>>>>>>> network connectivity.
>>>>>>>>
>>>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>>>
>>>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>>>> reset")
>>>>>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>>>>>> ---
>>>>>>>>      drivers/net/virtio_net.c | 1 +
>>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>>>> virtio_device *vdev)
>>>>>>>>          }
>>>>>>>>            netif_device_attach(vi->dev);
>>>>>>>> +    netif_start_queue(vi->dev);
>>>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>>>> netif_device_attach() above?
>>>>>> Thank you for your review.
>>>>>>
>>>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>>>> conditions in netif_device_attach() is not satisfied?
>>>>> Yes, maybe. One case I can see now is when the device is down, in this
>>>>> case netif_device_attach() won't try to wakeup the queue.
>>>>>
>>>>>>     Without
>>>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>>>> after waking up.
>>>>> How do you trigger the issue? Just do suspend/resume?
>>>> Yes, simply suspend and resume.
>>>>
>>>> Here is how I trigger the issue:
>>>>
>>>> 1) Start the Virtual Machine Manager GUI program.
>>>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>>>      >= 4.12. Make sure that it uses virtio_net as its network device.
>>>>      In addition, make sure that the video adapter is VGA. Otherwise,
>>>>      waking up with the virtual power button does not work.
>>>> 3) After installing the guest OS, log in, and test the network
>>>>      connectivity by ping the host machine.
>>>> 4) Suspend. After this, the screen is blank.
>>>> 5) Resume by hitting the virtual power button. The login screen
>>>>      appears again.
>>>> 6) Log in again. The guest loses its network connection.
>>>>
>>>> In my test:
>>>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>>>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>>> I can not reproduce this issue if virtio-net interface is up in guest
>>> before the suspend. I'm using net-next.git and qemu master. But I do
>>> reproduce when virtio-net interface is down in guest before suspend,
>>> after resume, even if I make it up, the network is still lost.
>>>
>>> I think the interface is up in your case, but please confirm this.
>> If you mean the interface state before I hit the suspend button,
>> the answer is yes. The interface is up before I suspend the guest
>> machine.
>>
>> Note that my current QEMU version is QEMU emulator version 2.5.0
>> (Debian 1:2.5+dfsg-5ubuntu10.32).
>>
>> I will try with net-next.git and qemu master later and see if I can
>> reproduce the issue.
> Update. I tried with net-next and qemu master. Interestingly, the result
> is different from yours. The network is lost even if the virtio_net
> interface is up before suspending.
>
> Host: Ubuntu 16.04 with net-next kernel (default configuration)
> Guest: Ubuntu 18.04 with net-next kernel (default configuration)
> Qemu: master
> Qemu command:
> qemu-system-x86_64 -cpu host -m 2048 -enable-kvm \
> -bios /usr/share/OVMF/OVMF_CODE.fd \
> -drive file=/var/lib/libvirt/images/virtio_test.qcow2,if=virtio \
> -netdev user,id=hostnet0 \
> -device virtio-net-pci,netdev=hostnet0 \
> -device VGA,id=video0,vgamem_mb=16 \
> -global PIIX4_PM.disable_s3=1 \
> -global PIIX4_PM.disable_s4=1 -monitor stdio


Interesting, just notice you're using userspace network. To isolate the 
issue, can you retry with e.g tap or e1000 to make sure it's not a fault 
of slirp or virito-net?

Thanks


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-16  8:53               ` Jason Wang
@ 2018-10-16 10:15                 ` ake
  2018-10-17  6:18                   ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: ake @ 2018-10-16 10:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel



On 2018年10月16日 17:53, Jason Wang wrote:
> 
> On 2018/10/15 下午6:08, ake wrote:
>>
>> On 2018年10月12日 18:18, ake wrote:
>>>
>>> On 2018年10月12日 17:23, Jason Wang wrote:
>>>>
>>>> On 2018年10月12日 12:30, ake wrote:
>>>>> On 2018年10月11日 22:06, Jason Wang wrote:
>>>>>> On 2018年10月11日 18:22, ake wrote:
>>>>>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>>>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>>>>> reset")
>>>>>>>>> disabled the virtio tx before going to suspend to avoid a use
>>>>>>>>> after
>>>>>>>>> free.
>>>>>>>>> However, after resuming, it causes the virtio_net device to
>>>>>>>>> lose its
>>>>>>>>> network connectivity.
>>>>>>>>>
>>>>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>>>>
>>>>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine
>>>>>>>>> during
>>>>>>>>> reset")
>>>>>>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>>>>>>> ---
>>>>>>>>>      drivers/net/virtio_net.c | 1 +
>>>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>>>>> virtio_device *vdev)
>>>>>>>>>          }
>>>>>>>>>            netif_device_attach(vi->dev);
>>>>>>>>> +    netif_start_queue(vi->dev);
>>>>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>>>>> netif_device_attach() above?
>>>>>>> Thank you for your review.
>>>>>>>
>>>>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>>>>> conditions in netif_device_attach() is not satisfied?
>>>>>> Yes, maybe. One case I can see now is when the device is down, in
>>>>>> this
>>>>>> case netif_device_attach() won't try to wakeup the queue.
>>>>>>
>>>>>>>     Without
>>>>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>>>>> after waking up.
>>>>>> How do you trigger the issue? Just do suspend/resume?
>>>>> Yes, simply suspend and resume.
>>>>>
>>>>> Here is how I trigger the issue:
>>>>>
>>>>> 1) Start the Virtual Machine Manager GUI program.
>>>>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>>>>      >= 4.12. Make sure that it uses virtio_net as its network device.
>>>>>      In addition, make sure that the video adapter is VGA. Otherwise,
>>>>>      waking up with the virtual power button does not work.
>>>>> 3) After installing the guest OS, log in, and test the network
>>>>>      connectivity by ping the host machine.
>>>>> 4) Suspend. After this, the screen is blank.
>>>>> 5) Resume by hitting the virtual power button. The login screen
>>>>>      appears again.
>>>>> 6) Log in again. The guest loses its network connection.
>>>>>
>>>>> In my test:
>>>>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>>>>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>>>> I can not reproduce this issue if virtio-net interface is up in guest
>>>> before the suspend. I'm using net-next.git and qemu master. But I do
>>>> reproduce when virtio-net interface is down in guest before suspend,
>>>> after resume, even if I make it up, the network is still lost.
>>>>
>>>> I think the interface is up in your case, but please confirm this.
>>> If you mean the interface state before I hit the suspend button,
>>> the answer is yes. The interface is up before I suspend the guest
>>> machine.
>>>
>>> Note that my current QEMU version is QEMU emulator version 2.5.0
>>> (Debian 1:2.5+dfsg-5ubuntu10.32).
>>>
>>> I will try with net-next.git and qemu master later and see if I can
>>> reproduce the issue.
>> Update. I tried with net-next and qemu master. Interestingly, the result
>> is different from yours. The network is lost even if the virtio_net
>> interface is up before suspending.
>>
>> Host: Ubuntu 16.04 with net-next kernel (default configuration)
>> Guest: Ubuntu 18.04 with net-next kernel (default configuration)
>> Qemu: master
>> Qemu command:
>> qemu-system-x86_64 -cpu host -m 2048 -enable-kvm \
>> -bios /usr/share/OVMF/OVMF_CODE.fd \
>> -drive file=/var/lib/libvirt/images/virtio_test.qcow2,if=virtio \
>> -netdev user,id=hostnet0 \
>> -device virtio-net-pci,netdev=hostnet0 \
>> -device VGA,id=video0,vgamem_mb=16 \
>> -global PIIX4_PM.disable_s3=1 \
>> -global PIIX4_PM.disable_s4=1 -monitor stdio
> 
> 
> Interesting, just notice you're using userspace network. To isolate the
> issue, can you retry with e.g tap or e1000 to make sure it's not a fault
> of slirp or virito-net?

I will try.

> Thanks
> 

There is another thing that I want to discuss. I notice that
netif_device_detach() should result in setting __QUEUE_STATE_DRV_XOFF if
the network interface is running. By calling netif_tx_disable() after
netif_device_detach(), isn't it redundant in case of the network
interface is running? If the goal is to serialize tx routine, would
netif_tx_lock() and net_tx_unlock() are more appropriate? Like this:

netif_tx_lock(vi->dev);
netif_device_detach(vi->dev);
netif_tx_unlock(vi->dev);

Currently, netif_tx_disable() seems to disturb the symmetry of
netif_device_detach() and netif_device_attach(). That is the reason
why you can reproduce the problem when the interface is down before
suspending.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
  2018-10-16 10:15                 ` ake
@ 2018-10-17  6:18                   ` Jason Wang
  2018-10-17  7:59                     ` [PATCH v2] virtio_net: avoid using netif_tx_disable() for serializing tx routine Ake Koomsin
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2018-10-17  6:18 UTC (permalink / raw)
  To: ake
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel


On 2018/10/16 下午6:15, ake wrote:
>
> On 2018年10月16日 17:53, Jason Wang wrote:
>> On 2018/10/15 下午6:08, ake wrote:
>>> On 2018年10月12日 18:18, ake wrote:
>>>> On 2018年10月12日 17:23, Jason Wang wrote:
>>>>> On 2018年10月12日 12:30, ake wrote:
>>>>>> On 2018年10月11日 22:06, Jason Wang wrote:
>>>>>>> On 2018年10月11日 18:22, ake wrote:
>>>>>>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>>>>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>>>>>> reset")
>>>>>>>>>> disabled the virtio tx before going to suspend to avoid a use
>>>>>>>>>> after
>>>>>>>>>> free.
>>>>>>>>>> However, after resuming, it causes the virtio_net device to
>>>>>>>>>> lose its
>>>>>>>>>> network connectivity.
>>>>>>>>>>
>>>>>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>>>>>
>>>>>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine
>>>>>>>>>> during
>>>>>>>>>> reset")
>>>>>>>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>>>>>>>> ---
>>>>>>>>>>       drivers/net/virtio_net.c | 1 +
>>>>>>>>>>       1 file changed, 1 insertion(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>>>>>> virtio_device *vdev)
>>>>>>>>>>           }
>>>>>>>>>>             netif_device_attach(vi->dev);
>>>>>>>>>> +    netif_start_queue(vi->dev);
>>>>>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>>>>>> netif_device_attach() above?
>>>>>>>> Thank you for your review.
>>>>>>>>
>>>>>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>>>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>>>>>> conditions in netif_device_attach() is not satisfied?
>>>>>>> Yes, maybe. One case I can see now is when the device is down, in
>>>>>>> this
>>>>>>> case netif_device_attach() won't try to wakeup the queue.
>>>>>>>
>>>>>>>>      Without
>>>>>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>>>>>> after waking up.
>>>>>>> How do you trigger the issue? Just do suspend/resume?
>>>>>> Yes, simply suspend and resume.
>>>>>>
>>>>>> Here is how I trigger the issue:
>>>>>>
>>>>>> 1) Start the Virtual Machine Manager GUI program.
>>>>>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>>>>>       >= 4.12. Make sure that it uses virtio_net as its network device.
>>>>>>       In addition, make sure that the video adapter is VGA. Otherwise,
>>>>>>       waking up with the virtual power button does not work.
>>>>>> 3) After installing the guest OS, log in, and test the network
>>>>>>       connectivity by ping the host machine.
>>>>>> 4) Suspend. After this, the screen is blank.
>>>>>> 5) Resume by hitting the virtual power button. The login screen
>>>>>>       appears again.
>>>>>> 6) Log in again. The guest loses its network connection.
>>>>>>
>>>>>> In my test:
>>>>>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>>>>>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>>>>> I can not reproduce this issue if virtio-net interface is up in guest
>>>>> before the suspend. I'm using net-next.git and qemu master. But I do
>>>>> reproduce when virtio-net interface is down in guest before suspend,
>>>>> after resume, even if I make it up, the network is still lost.
>>>>>
>>>>> I think the interface is up in your case, but please confirm this.
>>>> If you mean the interface state before I hit the suspend button,
>>>> the answer is yes. The interface is up before I suspend the guest
>>>> machine.
>>>>
>>>> Note that my current QEMU version is QEMU emulator version 2.5.0
>>>> (Debian 1:2.5+dfsg-5ubuntu10.32).
>>>>
>>>> I will try with net-next.git and qemu master later and see if I can
>>>> reproduce the issue.
>>> Update. I tried with net-next and qemu master. Interestingly, the result
>>> is different from yours. The network is lost even if the virtio_net
>>> interface is up before suspending.
>>>
>>> Host: Ubuntu 16.04 with net-next kernel (default configuration)
>>> Guest: Ubuntu 18.04 with net-next kernel (default configuration)
>>> Qemu: master
>>> Qemu command:
>>> qemu-system-x86_64 -cpu host -m 2048 -enable-kvm \
>>> -bios /usr/share/OVMF/OVMF_CODE.fd \
>>> -drive file=/var/lib/libvirt/images/virtio_test.qcow2,if=virtio \
>>> -netdev user,id=hostnet0 \
>>> -device virtio-net-pci,netdev=hostnet0 \
>>> -device VGA,id=video0,vgamem_mb=16 \
>>> -global PIIX4_PM.disable_s3=1 \
>>> -global PIIX4_PM.disable_s4=1 -monitor stdio
>>
>> Interesting, just notice you're using userspace network. To isolate the
>> issue, can you retry with e.g tap or e1000 to make sure it's not a fault
>> of slirp or virito-net?
> I will try.
>
>> Thanks
>>
> There is another thing that I want to discuss. I notice that
> netif_device_detach() should result in setting __QUEUE_STATE_DRV_XOFF if
> the network interface is running. By calling netif_tx_disable() after
> netif_device_detach(), isn't it redundant in case of the network
> interface is running? If the goal is to serialize tx routine, would
> netif_tx_lock() and net_tx_unlock() are more appropriate? Like this:
>
> netif_tx_lock(vi->dev);
> netif_device_detach(vi->dev);
> netif_tx_unlock(vi->dev);
>
> Currently, netif_tx_disable() seems to disturb the symmetry of
> netif_device_detach() and netif_device_attach(). That is the reason
> why you can reproduce the problem when the interface is down before
> suspending.


Yes I agree.

Thanks


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] virtio_net: avoid using netif_tx_disable() for serializing tx routine
  2018-10-17  6:18                   ` Jason Wang
@ 2018-10-17  7:59                     ` Ake Koomsin
  2018-10-17  9:02                       ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Ake Koomsin @ 2018-10-17  7:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: ake, Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel

Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
introduces netif_tx_disable() after netif_device_detach() in order to
avoid use-after-free of tx queues. However, there are two issues.

1) Its operation is redundant with netif_device_detach() if the interface
   is running.
2) In case of the interface is not running before suspending and
   resuming, the tx does not get resumed by netif_device_attach().
   This results in losing network connectivity.

It is better to use netif_tx_lock()/netif_tx_unlock() instead for
serializing tx routine during reset. This also preserves the symmetry
of netif_device_detach() and netif_device_attach().

Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
---
 drivers/net/virtio_net.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3f5aa59c37b7..41ccf9c994a4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2267,8 +2267,9 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 	/* Make sure no work handler is accessing the device */
 	flush_work(&vi->config_work);
 
+	netif_tx_lock(vi->dev);
 	netif_device_detach(vi->dev);
-	netif_tx_disable(vi->dev);
+	netif_tx_unlock(vi->dev);
 	cancel_delayed_work_sync(&vi->refill);
 
 	if (netif_running(vi->dev)) {
@@ -2304,7 +2305,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 		}
 	}
 
+	netif_tx_lock(vi->dev);
 	netif_device_attach(vi->dev);
+	netif_tx_unlock(vi->dev);
 	return err;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] virtio_net: avoid using netif_tx_disable() for serializing tx routine
  2018-10-17  7:59                     ` [PATCH v2] virtio_net: avoid using netif_tx_disable() for serializing tx routine Ake Koomsin
@ 2018-10-17  9:02                       ` Jason Wang
  2018-10-17 10:44                         ` [PATCH v3] " Ake Koomsin
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2018-10-17  9:02 UTC (permalink / raw)
  To: Ake Koomsin
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel


On 2018/10/17 下午3:59, Ake Koomsin wrote:
> Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> introduces netif_tx_disable() after netif_device_detach() in order to
> avoid use-after-free of tx queues. However, there are two issues.
>
> 1) Its operation is redundant with netif_device_detach() if the interface
>     is running.
> 2) In case of the interface is not running before suspending and
>     resuming, the tx does not get resumed by netif_device_attach().
>     This results in losing network connectivity.
>
> It is better to use netif_tx_lock()/netif_tx_unlock() instead for
> serializing tx routine during reset. This also preserves the symmetry
> of netif_device_detach() and netif_device_attach().
>
> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
> ---
>   drivers/net/virtio_net.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3f5aa59c37b7..41ccf9c994a4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2267,8 +2267,9 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>   	/* Make sure no work handler is accessing the device */
>   	flush_work(&vi->config_work);
>   
> +	netif_tx_lock(vi->dev);
>   	netif_device_detach(vi->dev);
> -	netif_tx_disable(vi->dev);
> +	netif_tx_unlock(vi->dev);


Sorry for not finding this earlier. I think we should use 
netif_tx_lock_bh() to prevent start_xmit() to run under bh.

Thanks


>   	cancel_delayed_work_sync(&vi->refill);
>   
>   	if (netif_running(vi->dev)) {
> @@ -2304,7 +2305,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>   		}
>   	}
>   
> +	netif_tx_lock(vi->dev);
>   	netif_device_attach(vi->dev);
> +	netif_tx_unlock(vi->dev);
>   	return err;
>   }
>   

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3] virtio_net: avoid using netif_tx_disable() for serializing tx routine
  2018-10-17  9:02                       ` Jason Wang
@ 2018-10-17 10:44                         ` Ake Koomsin
  2018-10-17 12:30                           ` Jason Wang
                                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ake Koomsin @ 2018-10-17 10:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: ake, Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel

Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
introduces netif_tx_disable() after netif_device_detach() in order to
avoid use-after-free of tx queues. However, there are two issues.

1) Its operation is redundant with netif_device_detach() in case the
   interface is running.
2) In case of the interface is not running before suspending and
   resuming, the tx does not get resumed by netif_device_attach().
   This results in losing network connectivity.

It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
serializing tx routine during reset. This also preserves the symmetry
of netif_device_detach() and netif_device_attach().

Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
Signed-off-by: Ake Koomsin <ake@igel.co.jp>
---
 drivers/net/virtio_net.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3f5aa59c37b7..3e2c041d76ac 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2267,8 +2267,9 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 	/* Make sure no work handler is accessing the device */
 	flush_work(&vi->config_work);
 
+	netif_tx_lock_bh(vi->dev);
 	netif_device_detach(vi->dev);
-	netif_tx_disable(vi->dev);
+	netif_tx_unlock_bh(vi->dev);
 	cancel_delayed_work_sync(&vi->refill);
 
 	if (netif_running(vi->dev)) {
@@ -2304,7 +2305,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 		}
 	}
 
+	netif_tx_lock_bh(vi->dev);
 	netif_device_attach(vi->dev);
+	netif_tx_unlock_bh(vi->dev);
 	return err;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] virtio_net: avoid using netif_tx_disable() for serializing tx routine
  2018-10-17 10:44                         ` [PATCH v3] " Ake Koomsin
@ 2018-10-17 12:30                           ` Jason Wang
  2018-10-17 15:09                           ` Michael S. Tsirkin
  2018-10-18  5:30                           ` David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2018-10-17 12:30 UTC (permalink / raw)
  To: Ake Koomsin
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel


On 2018/10/17 下午6:44, Ake Koomsin wrote:
> Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> introduces netif_tx_disable() after netif_device_detach() in order to
> avoid use-after-free of tx queues. However, there are two issues.
>
> 1) Its operation is redundant with netif_device_detach() in case the
>     interface is running.
> 2) In case of the interface is not running before suspending and
>     resuming, the tx does not get resumed by netif_device_attach().
>     This results in losing network connectivity.
>
> It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
> serializing tx routine during reset. This also preserves the symmetry
> of netif_device_detach() and netif_device_attach().
>
> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
> ---
>   drivers/net/virtio_net.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3f5aa59c37b7..3e2c041d76ac 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2267,8 +2267,9 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>   	/* Make sure no work handler is accessing the device */
>   	flush_work(&vi->config_work);
>   
> +	netif_tx_lock_bh(vi->dev);
>   	netif_device_detach(vi->dev);
> -	netif_tx_disable(vi->dev);
> +	netif_tx_unlock_bh(vi->dev);
>   	cancel_delayed_work_sync(&vi->refill);
>   
>   	if (netif_running(vi->dev)) {
> @@ -2304,7 +2305,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>   		}
>   	}
>   
> +	netif_tx_lock_bh(vi->dev);
>   	netif_device_attach(vi->dev);
> +	netif_tx_unlock_bh(vi->dev);
>   	return err;
>   }
>   


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] virtio_net: avoid using netif_tx_disable() for serializing tx routine
  2018-10-17 10:44                         ` [PATCH v3] " Ake Koomsin
  2018-10-17 12:30                           ` Jason Wang
@ 2018-10-17 15:09                           ` Michael S. Tsirkin
  2018-10-18  3:25                             ` ake
  2018-10-18  5:30                           ` David Miller
  2 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2018-10-17 15:09 UTC (permalink / raw)
  To: Ake Koomsin
  Cc: Jason Wang, David S. Miller, virtualization, netdev, linux-kernel

On Wed, Oct 17, 2018 at 07:44:12PM +0900, Ake Koomsin wrote:
> Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> introduces netif_tx_disable() after netif_device_detach() in order to
> avoid use-after-free of tx queues. However, there are two issues.
> 
> 1) Its operation is redundant with netif_device_detach() in case the
>    interface is running.
> 2) In case of the interface is not running before suspending and
>    resuming, the tx does not get resumed by netif_device_attach().
>    This results in losing network connectivity.
> 
> It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
> serializing tx routine during reset. This also preserves the symmetry
> of netif_device_detach() and netif_device_attach().
> 
> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> Signed-off-by: Ake Koomsin <ake@igel.co.jp>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

Thanks a lot for debugging!
Seems like stable material to me, right?

> ---
>  drivers/net/virtio_net.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3f5aa59c37b7..3e2c041d76ac 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2267,8 +2267,9 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>  	/* Make sure no work handler is accessing the device */
>  	flush_work(&vi->config_work);
>  
> +	netif_tx_lock_bh(vi->dev);
>  	netif_device_detach(vi->dev);
> -	netif_tx_disable(vi->dev);
> +	netif_tx_unlock_bh(vi->dev);
>  	cancel_delayed_work_sync(&vi->refill);
>  
>  	if (netif_running(vi->dev)) {
> @@ -2304,7 +2305,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>  		}
>  	}
>  
> +	netif_tx_lock_bh(vi->dev);
>  	netif_device_attach(vi->dev);
> +	netif_tx_unlock_bh(vi->dev);
>  	return err;
>  }
>  
> -- 
> 2.19.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] virtio_net: avoid using netif_tx_disable() for serializing tx routine
  2018-10-17 15:09                           ` Michael S. Tsirkin
@ 2018-10-18  3:25                             ` ake
  0 siblings, 0 replies; 18+ messages in thread
From: ake @ 2018-10-18  3:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, David S. Miller, virtualization, netdev, linux-kernel



On 2018/10/18 0:09, Michael S. Tsirkin wrote:
> On Wed, Oct 17, 2018 at 07:44:12PM +0900, Ake Koomsin wrote:
>> Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>> introduces netif_tx_disable() after netif_device_detach() in order to
>> avoid use-after-free of tx queues. However, there are two issues.
>>
>> 1) Its operation is redundant with netif_device_detach() in case the
>>    interface is running.
>> 2) In case of the interface is not running before suspending and
>>    resuming, the tx does not get resumed by netif_device_attach().
>>    This results in losing network connectivity.
>>
>> It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
>> serializing tx routine during reset. This also preserves the symmetry
>> of netif_device_detach() and netif_device_attach().
>>
>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Thanks a lot for debugging!
> Seems like stable material to me, right?

Yes. With this patch, we can avoid network connectivity lost
because of tx not get re-enabled under some situation. Plus, it avoids
redundant operation between netif_device_detach() and
netif_tx_disable().

I tested the patch on Linux net-next and QEMU master by suspending/
resuming the virtual machine repeatedly. The network looks no problem
and has no connectivity lost so far. I tested with both user-mode
networking and tap interface.

Best Regards

>> ---
>>  drivers/net/virtio_net.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 3f5aa59c37b7..3e2c041d76ac 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2267,8 +2267,9 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>>  	/* Make sure no work handler is accessing the device */
>>  	flush_work(&vi->config_work);
>>  
>> +	netif_tx_lock_bh(vi->dev);
>>  	netif_device_detach(vi->dev);
>> -	netif_tx_disable(vi->dev);
>> +	netif_tx_unlock_bh(vi->dev);
>>  	cancel_delayed_work_sync(&vi->refill);
>>  
>>  	if (netif_running(vi->dev)) {
>> @@ -2304,7 +2305,9 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>>  		}
>>  	}
>>  
>> +	netif_tx_lock_bh(vi->dev);
>>  	netif_device_attach(vi->dev);
>> +	netif_tx_unlock_bh(vi->dev);
>>  	return err;
>>  }
>>  
>> -- 
>> 2.19.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] virtio_net: avoid using netif_tx_disable() for serializing tx routine
  2018-10-17 10:44                         ` [PATCH v3] " Ake Koomsin
  2018-10-17 12:30                           ` Jason Wang
  2018-10-17 15:09                           ` Michael S. Tsirkin
@ 2018-10-18  5:30                           ` David Miller
  2 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2018-10-18  5:30 UTC (permalink / raw)
  To: ake; +Cc: jasowang, mst, virtualization, netdev, linux-kernel

From: Ake Koomsin <ake@igel.co.jp>
Date: Wed, 17 Oct 2018 19:44:12 +0900

> Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> introduces netif_tx_disable() after netif_device_detach() in order to
> avoid use-after-free of tx queues. However, there are two issues.
> 
> 1) Its operation is redundant with netif_device_detach() in case the
>    interface is running.
> 2) In case of the interface is not running before suspending and
>    resuming, the tx does not get resumed by netif_device_attach().
>    This results in losing network connectivity.
> 
> It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
> serializing tx routine during reset. This also preserves the symmetry
> of netif_device_detach() and netif_device_attach().
> 
> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> Signed-off-by: Ake Koomsin <ake@igel.co.jp>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-10-18  5:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-11  7:51 [PATCH] virtio_net: enable tx after resuming from suspend Ake Koomsin
2018-10-11  9:44 ` Jason Wang
2018-10-11 10:22   ` ake
2018-10-11 13:06     ` Jason Wang
2018-10-12  4:30       ` ake
2018-10-12  8:23         ` Jason Wang
2018-10-12  9:18           ` ake
2018-10-15 10:08             ` ake
2018-10-16  8:53               ` Jason Wang
2018-10-16 10:15                 ` ake
2018-10-17  6:18                   ` Jason Wang
2018-10-17  7:59                     ` [PATCH v2] virtio_net: avoid using netif_tx_disable() for serializing tx routine Ake Koomsin
2018-10-17  9:02                       ` Jason Wang
2018-10-17 10:44                         ` [PATCH v3] " Ake Koomsin
2018-10-17 12:30                           ` Jason Wang
2018-10-17 15:09                           ` Michael S. Tsirkin
2018-10-18  3:25                             ` ake
2018-10-18  5:30                           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).