linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
@ 2016-11-30 12:10 Yunjian Wang
  2016-11-30 13:07 ` Jason Wang
  2016-11-30 13:40 ` Michael S. Tsirkin
  0 siblings, 2 replies; 13+ messages in thread
From: Yunjian Wang @ 2016-11-30 12:10 UTC (permalink / raw)
  To: mst, jasowang, netdev, linux-kernel; +Cc: caihe, wangyunjian

When we meet an error(err=-EBADFD) recvmsg, the error handling in vhost
handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 drivers/vhost/net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..edc470b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
 			pr_debug("Discarded rx packet: "
 				 " len %d, expected %zd\n", err, sock_len);
 			vhost_discard_vq_desc(vq, headcount);
+			/* Don't continue to do, when meet errors. */
+			if (err < 0)
+				goto out;
 			continue;
 		}
 		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
-- 
1.9.5.msysgit.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-11-30 12:10 [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors Yunjian Wang
@ 2016-11-30 13:07 ` Jason Wang
  2016-11-30 13:43   ` Michael S. Tsirkin
  2016-11-30 13:40 ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Jason Wang @ 2016-11-30 13:07 UTC (permalink / raw)
  To: Yunjian Wang, mst, netdev, linux-kernel; +Cc: caihe



On 2016年11月30日 20:10, Yunjian Wang wrote:
> When we meet an error(err=-EBADFD) recvmsg, the error handling in vhost
> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
>
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>   drivers/vhost/net.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc128a..edc470b 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>   			pr_debug("Discarded rx packet: "
>   				 " len %d, expected %zd\n", err, sock_len);
>   			vhost_discard_vq_desc(vq, headcount);
> +			/* Don't continue to do, when meet errors. */
> +			if (err < 0)
> +				goto out;
>   			continue;
>   		}
>   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */

Acked-by: Jason Wang <jasowang@redhat.com>

We may want to rename vhost_discard_vq_desc() in the future, since it 
does not discard the desc in fact.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-11-30 12:10 [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors Yunjian Wang
  2016-11-30 13:07 ` Jason Wang
@ 2016-11-30 13:40 ` Michael S. Tsirkin
  2016-12-01  2:48   ` wangyunjian
  2016-12-01  3:15   ` Jason Wang
  1 sibling, 2 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-11-30 13:40 UTC (permalink / raw)
  To: Yunjian Wang; +Cc: jasowang, netdev, linux-kernel, caihe

On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
> When we meet an error(err=-EBADFD) recvmsg,

How do you get EBADFD? Won't vhost_net_rx_peek_head_len
return 0 in this case, breaking the loop?

> the error handling in vhost
> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>  drivers/vhost/net.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 5dc128a..edc470b 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>  			pr_debug("Discarded rx packet: "
>  				 " len %d, expected %zd\n", err, sock_len);
>  			vhost_discard_vq_desc(vq, headcount);
> +			/* Don't continue to do, when meet errors. */
> +			if (err < 0)
> +				goto out;

You might get e.g. EAGAIN and I think you need to retry
in this case.

>  			continue;
>  		}
>  		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
> -- 
> 1.9.5.msysgit.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-11-30 13:07 ` Jason Wang
@ 2016-11-30 13:43   ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-11-30 13:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: Yunjian Wang, netdev, linux-kernel, caihe

On Wed, Nov 30, 2016 at 09:07:16PM +0800, Jason Wang wrote:
> 
> 
> On 2016年11月30日 20:10, Yunjian Wang wrote:
> > When we meet an error(err=-EBADFD) recvmsg, the error handling in vhost
> > handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
> > 
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > ---
> >   drivers/vhost/net.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 5dc128a..edc470b 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
> >   			pr_debug("Discarded rx packet: "
> >   				 " len %d, expected %zd\n", err, sock_len);
> >   			vhost_discard_vq_desc(vq, headcount);
> > +			/* Don't continue to do, when meet errors. */
> > +			if (err < 0)
> > +				goto out;
> >   			continue;
> >   		}
> >   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
> 
> Acked-by: Jason Wang <jasowang@redhat.com>
> 
> We may want to rename vhost_discard_vq_desc() in the future, since it does
> not discard the desc in fact.

To me this looks a bit too aggressive. I would also like commit log
to explain better what is going on, to make sure we are
not just treating the symptoms.
-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-11-30 13:40 ` Michael S. Tsirkin
@ 2016-12-01  2:48   ` wangyunjian
  2016-12-01  3:13     ` Jason Wang
  2016-12-01  3:21     ` Michael S. Tsirkin
  2016-12-01  3:15   ` Jason Wang
  1 sibling, 2 replies; 13+ messages in thread
From: wangyunjian @ 2016-12-01  2:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, netdev, linux-kernel, caihe


>-----Original Message-----
>From: Michael S. Tsirkin [mailto:mst@redhat.com] 
>Sent: Wednesday, November 30, 2016 9:41 PM
>To: wangyunjian
>Cc: jasowang@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; caihe
>Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
>
>On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
>> When we meet an error(err=-EBADFD) recvmsg,
>
>How do you get EBADFD? Won't vhost_net_rx_peek_head_len
>return 0 in this case, breaking the loop?

We started many guest VMs while attaching/detaching some virtio-net nics for loop. 
The soft lockup might happened. The err is -EBADFD.

meesage log:
kernel:[609608.510180]BUG: soft lockup - CPU#18 stuck for 23s! [vhost-60898:126093]
call trace:
[<fffffffa0132967>]vhost_get_vq_desc+0x1e7/0x984 [vhost]
[<fffffffa02037e6>]handle_rx+0x226/0x810 [vhost_net]
[<fffffffa0203de5>]handle_rx_net+0x15/0x20 [vhost_net]
[<fffffffa013160b>]vhost_worker+0xfb/0x1e0 [vhost]
[<fffffffa0131510>]? vhost_dev_reset_owner+0x50/0x50 [vhost]
[<fffffff810a5c7f>]kthread+0xcf/0xe0
[<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140
[<fffffff81648898>]ret_from_fork+0x58/0x90
[<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140

>> the error handling in vhost
>> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
>> 
>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>> ---
>>  drivers/vhost/net.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 5dc128a..edc470b 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>>  			pr_debug("Discarded rx packet: "
>>  				 " len %d, expected %zd\n", err, sock_len);
>>  			vhost_discard_vq_desc(vq, headcount);
>> +			/* Don't continue to do, when meet errors. */
>> +			if (err < 0)
>> +				goto out;
>
>You might get e.g. EAGAIN and I think you need to retry
>in this case.
>
>>  			continue;
>>  		}
>>  		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
>> -- 
>> 1.9.5.msysgit.1
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-12-01  2:48   ` wangyunjian
@ 2016-12-01  3:13     ` Jason Wang
  2016-12-01  3:21     ` Michael S. Tsirkin
  1 sibling, 0 replies; 13+ messages in thread
From: Jason Wang @ 2016-12-01  3:13 UTC (permalink / raw)
  To: wangyunjian, Michael S. Tsirkin; +Cc: netdev, linux-kernel, caihe



On 2016年12月01日 10:48, wangyunjian wrote:
>> -----Original Message-----
>> From: Michael S. Tsirkin [mailto:mst@redhat.com]
>> Sent: Wednesday, November 30, 2016 9:41 PM
>> To: wangyunjian
>> Cc: jasowang@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; caihe
>> Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
>>
>> On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
>>> When we meet an error(err=-EBADFD) recvmsg,
>> How do you get EBADFD? Won't vhost_net_rx_peek_head_len
>> return 0 in this case, breaking the loop?
> We started many guest VMs while attaching/detaching some virtio-net nics for loop.
> The soft lockup might happened. The err is -EBADFD.

How did you do the attaching/detaching? AFAIK, the -EBADFD can only 
happens when you deleting tun device during vhost_net transmission.

>
> meesage log:
> kernel:[609608.510180]BUG: soft lockup - CPU#18 stuck for 23s! [vhost-60898:126093]
> call trace:
> [<fffffffa0132967>]vhost_get_vq_desc+0x1e7/0x984 [vhost]
> [<fffffffa02037e6>]handle_rx+0x226/0x810 [vhost_net]
> [<fffffffa0203de5>]handle_rx_net+0x15/0x20 [vhost_net]
> [<fffffffa013160b>]vhost_worker+0xfb/0x1e0 [vhost]
> [<fffffffa0131510>]? vhost_dev_reset_owner+0x50/0x50 [vhost]
> [<fffffff810a5c7f>]kthread+0xcf/0xe0
> [<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140
> [<fffffff81648898>]ret_from_fork+0x58/0x90
> [<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140
>
>>> the error handling in vhost
>>> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
>>>
>>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>>> ---
>>>   drivers/vhost/net.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>> index 5dc128a..edc470b 100644
>>> --- a/drivers/vhost/net.c
>>> +++ b/drivers/vhost/net.c
>>> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>>>   			pr_debug("Discarded rx packet: "
>>>   				 " len %d, expected %zd\n", err, sock_len);
>>>   			vhost_discard_vq_desc(vq, headcount);
>>> +			/* Don't continue to do, when meet errors. */
>>> +			if (err < 0)
>>> +				goto out;
>> You might get e.g. EAGAIN and I think you need to retry
>> in this case.
>>
>>>   			continue;
>>>   		}
>>>   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
>>> -- 
>>> 1.9.5.msysgit.1
>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-11-30 13:40 ` Michael S. Tsirkin
  2016-12-01  2:48   ` wangyunjian
@ 2016-12-01  3:15   ` Jason Wang
  1 sibling, 0 replies; 13+ messages in thread
From: Jason Wang @ 2016-12-01  3:15 UTC (permalink / raw)
  To: Michael S. Tsirkin, Yunjian Wang; +Cc: netdev, linux-kernel, caihe



On 2016年11月30日 21:40, Michael S. Tsirkin wrote:
> On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
>> When we meet an error(err=-EBADFD) recvmsg,
> How do you get EBADFD? Won't vhost_net_rx_peek_head_len
> return 0 in this case, breaking the loop?
>
>> the error handling in vhost
>> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
>>
>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>> ---
>>   drivers/vhost/net.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 5dc128a..edc470b 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>>   			pr_debug("Discarded rx packet: "
>>   				 " len %d, expected %zd\n", err, sock_len);
>>   			vhost_discard_vq_desc(vq, headcount);
>> +			/* Don't continue to do, when meet errors. */
>> +			if (err < 0)
>> +				goto out;
> You might get e.g. EAGAIN and I think you need to retry
> in this case.

Yes.

>>   			continue;
>>   		}
>>   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
>> -- 
>> 1.9.5.msysgit.1
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-12-01  2:48   ` wangyunjian
  2016-12-01  3:13     ` Jason Wang
@ 2016-12-01  3:21     ` Michael S. Tsirkin
  2016-12-01  3:26       ` Jason Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-12-01  3:21 UTC (permalink / raw)
  To: wangyunjian; +Cc: jasowang, netdev, linux-kernel, caihe

On Thu, Dec 01, 2016 at 02:48:59AM +0000, wangyunjian wrote:
> 
> >-----Original Message-----
> >From: Michael S. Tsirkin [mailto:mst@redhat.com] 
> >Sent: Wednesday, November 30, 2016 9:41 PM
> >To: wangyunjian
> >Cc: jasowang@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; caihe
> >Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
> >
> >On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
> >> When we meet an error(err=-EBADFD) recvmsg,
> >
> >How do you get EBADFD? Won't vhost_net_rx_peek_head_len
> >return 0 in this case, breaking the loop?
> 
> We started many guest VMs while attaching/detaching some virtio-net nics for loop. 
> The soft lockup might happened. The err is -EBADFD.
> 

OK, I'd like to figure out what happened here. why don't
we get 0 when we peek at the head?

EBADFD is from here:
        struct tun_struct *tun = __tun_get(tfile);
...
        if (!tun)
                return -EBADFD;

but then:
static int tun_peek_len(struct socket *sock)
{       

...

        struct tun_struct *tun;
...
        
        tun = __tun_get(tfile);
        if (!tun) 
                return 0;


so peek len should return 0.

then while will exit:
        while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk)))
...


> meesage log:
> kernel:[609608.510180]BUG: soft lockup - CPU#18 stuck for 23s! [vhost-60898:126093]
> call trace:
> [<fffffffa0132967>]vhost_get_vq_desc+0x1e7/0x984 [vhost]
> [<fffffffa02037e6>]handle_rx+0x226/0x810 [vhost_net]
> [<fffffffa0203de5>]handle_rx_net+0x15/0x20 [vhost_net]
> [<fffffffa013160b>]vhost_worker+0xfb/0x1e0 [vhost]
> [<fffffffa0131510>]? vhost_dev_reset_owner+0x50/0x50 [vhost]
> [<fffffff810a5c7f>]kthread+0xcf/0xe0
> [<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140
> [<fffffff81648898>]ret_from_fork+0x58/0x90
> [<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140

So somehow you keep seeing something in tun when we peek.
IMO this is the problem we should try to fix.
Could you try debugging the root cause here?

> >> the error handling in vhost
> >> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
> >> 
> >> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> >> ---
> >>  drivers/vhost/net.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 5dc128a..edc470b 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
> >>  			pr_debug("Discarded rx packet: "
> >>  				 " len %d, expected %zd\n", err, sock_len);
> >>  			vhost_discard_vq_desc(vq, headcount);
> >> +			/* Don't continue to do, when meet errors. */
> >> +			if (err < 0)
> >> +				goto out;
> >
> >You might get e.g. EAGAIN and I think you need to retry
> >in this case.
> >
> >>  			continue;
> >>  		}
> >>  		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
> >> -- 
> >> 1.9.5.msysgit.1
> >>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-12-01  3:21     ` Michael S. Tsirkin
@ 2016-12-01  3:26       ` Jason Wang
  2016-12-01  3:27         ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2016-12-01  3:26 UTC (permalink / raw)
  To: Michael S. Tsirkin, wangyunjian; +Cc: netdev, linux-kernel, caihe



On 2016年12月01日 11:21, Michael S. Tsirkin wrote:
> On Thu, Dec 01, 2016 at 02:48:59AM +0000, wangyunjian wrote:
>>> -----Original Message-----
>>> From: Michael S. Tsirkin [mailto:mst@redhat.com]
>>> Sent: Wednesday, November 30, 2016 9:41 PM
>>> To: wangyunjian
>>> Cc: jasowang@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; caihe
>>> Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
>>>
>>> On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
>>>> When we meet an error(err=-EBADFD) recvmsg,
>>> How do you get EBADFD? Won't vhost_net_rx_peek_head_len
>>> return 0 in this case, breaking the loop?
>> We started many guest VMs while attaching/detaching some virtio-net nics for loop.
>> The soft lockup might happened. The err is -EBADFD.
>>
> OK, I'd like to figure out what happened here. why don't
> we get 0 when we peek at the head?
>
> EBADFD is from here:
>          struct tun_struct *tun = __tun_get(tfile);
> ...
>          if (!tun)
>                  return -EBADFD;
>
> but then:
> static int tun_peek_len(struct socket *sock)
> {
>
> ...
>
>          struct tun_struct *tun;
> ...
>          
>          tun = __tun_get(tfile);
>          if (!tun)
>                  return 0;
>
>
> so peek len should return 0.
>
> then while will exit:
>          while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk)))
> ...
>

Consider this case: user do ip link del link tap0 before recvmsg() but 
after tun_peek_len() ?

>> meesage log:
>> kernel:[609608.510180]BUG: soft lockup - CPU#18 stuck for 23s! [vhost-60898:126093]
>> call trace:
>> [<fffffffa0132967>]vhost_get_vq_desc+0x1e7/0x984 [vhost]
>> [<fffffffa02037e6>]handle_rx+0x226/0x810 [vhost_net]
>> [<fffffffa0203de5>]handle_rx_net+0x15/0x20 [vhost_net]
>> [<fffffffa013160b>]vhost_worker+0xfb/0x1e0 [vhost]
>> [<fffffffa0131510>]? vhost_dev_reset_owner+0x50/0x50 [vhost]
>> [<fffffff810a5c7f>]kthread+0xcf/0xe0
>> [<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140
>> [<fffffff81648898>]ret_from_fork+0x58/0x90
>> [<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140
> So somehow you keep seeing something in tun when we peek.
> IMO this is the problem we should try to fix.
> Could you try debugging the root cause here?
>
>>>> the error handling in vhost
>>>> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
>>>>
>>>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
>>>> ---
>>>>   drivers/vhost/net.c | 3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>> index 5dc128a..edc470b 100644
>>>> --- a/drivers/vhost/net.c
>>>> +++ b/drivers/vhost/net.c
>>>> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
>>>>   			pr_debug("Discarded rx packet: "
>>>>   				 " len %d, expected %zd\n", err, sock_len);
>>>>   			vhost_discard_vq_desc(vq, headcount);
>>>> +			/* Don't continue to do, when meet errors. */
>>>> +			if (err < 0)
>>>> +				goto out;
>>> You might get e.g. EAGAIN and I think you need to retry
>>> in this case.
>>>
>>>>   			continue;
>>>>   		}
>>>>   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
>>>> -- 
>>>> 1.9.5.msysgit.1
>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-12-01  3:26       ` Jason Wang
@ 2016-12-01  3:27         ` Michael S. Tsirkin
  2016-12-01  3:37           ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-12-01  3:27 UTC (permalink / raw)
  To: Jason Wang; +Cc: wangyunjian, netdev, linux-kernel, caihe

On Thu, Dec 01, 2016 at 11:26:21AM +0800, Jason Wang wrote:
> 
> 
> On 2016年12月01日 11:21, Michael S. Tsirkin wrote:
> > On Thu, Dec 01, 2016 at 02:48:59AM +0000, wangyunjian wrote:
> > > > -----Original Message-----
> > > > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > > > Sent: Wednesday, November 30, 2016 9:41 PM
> > > > To: wangyunjian
> > > > Cc: jasowang@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; caihe
> > > > Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
> > > > 
> > > > On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
> > > > > When we meet an error(err=-EBADFD) recvmsg,
> > > > How do you get EBADFD? Won't vhost_net_rx_peek_head_len
> > > > return 0 in this case, breaking the loop?
> > > We started many guest VMs while attaching/detaching some virtio-net nics for loop.
> > > The soft lockup might happened. The err is -EBADFD.
> > > 
> > OK, I'd like to figure out what happened here. why don't
> > we get 0 when we peek at the head?
> > 
> > EBADFD is from here:
> >          struct tun_struct *tun = __tun_get(tfile);
> > ...
> >          if (!tun)
> >                  return -EBADFD;
> > 
> > but then:
> > static int tun_peek_len(struct socket *sock)
> > {
> > 
> > ...
> > 
> >          struct tun_struct *tun;
> > ...
> >          tun = __tun_get(tfile);
> >          if (!tun)
> >                  return 0;
> > 
> > 
> > so peek len should return 0.
> > 
> > then while will exit:
> >          while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk)))
> > ...
> > 
> 
> Consider this case: user do ip link del link tap0 before recvmsg() but after
> tun_peek_len() ?

Sure, this can happen, but I think we'll just exit on the next loop,
won't we?

> > > meesage log:
> > > kernel:[609608.510180]BUG: soft lockup - CPU#18 stuck for 23s! [vhost-60898:126093]
> > > call trace:
> > > [<fffffffa0132967>]vhost_get_vq_desc+0x1e7/0x984 [vhost]
> > > [<fffffffa02037e6>]handle_rx+0x226/0x810 [vhost_net]
> > > [<fffffffa0203de5>]handle_rx_net+0x15/0x20 [vhost_net]
> > > [<fffffffa013160b>]vhost_worker+0xfb/0x1e0 [vhost]
> > > [<fffffffa0131510>]? vhost_dev_reset_owner+0x50/0x50 [vhost]
> > > [<fffffff810a5c7f>]kthread+0xcf/0xe0
> > > [<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140
> > > [<fffffff81648898>]ret_from_fork+0x58/0x90
> > > [<fffffff810a5bb0>]? kthread_create_on_node+0x140/0x140
> > So somehow you keep seeing something in tun when we peek.
> > IMO this is the problem we should try to fix.
> > Could you try debugging the root cause here?
> > 
> > > > > the error handling in vhost
> > > > > handle_rx() will continue. This will cause a soft CPU lockup in vhost thread.
> > > > > 
> > > > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > > > ---
> > > > >   drivers/vhost/net.c | 3 +++
> > > > >   1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > > > index 5dc128a..edc470b 100644
> > > > > --- a/drivers/vhost/net.c
> > > > > +++ b/drivers/vhost/net.c
> > > > > @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net)
> > > > >   			pr_debug("Discarded rx packet: "
> > > > >   				 " len %d, expected %zd\n", err, sock_len);
> > > > >   			vhost_discard_vq_desc(vq, headcount);
> > > > > +			/* Don't continue to do, when meet errors. */
> > > > > +			if (err < 0)
> > > > > +				goto out;
> > > > You might get e.g. EAGAIN and I think you need to retry
> > > > in this case.
> > > > 
> > > > >   			continue;
> > > > >   		}
> > > > >   		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
> > > > > -- 
> > > > > 1.9.5.msysgit.1
> > > > > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-12-01  3:27         ` Michael S. Tsirkin
@ 2016-12-01  3:37           ` Jason Wang
  2016-12-01  4:41             ` wangyunjian
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2016-12-01  3:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: wangyunjian, netdev, linux-kernel, caihe



On 2016年12月01日 11:27, Michael S. Tsirkin wrote:
> On Thu, Dec 01, 2016 at 11:26:21AM +0800, Jason Wang wrote:
>> >
>> >
>> >On 2016年12月01日 11:21, Michael S. Tsirkin wrote:
>>> > >On Thu, Dec 01, 2016 at 02:48:59AM +0000, wangyunjian wrote:
>>>>> > > > >-----Original Message-----
>>>>> > > > >From: Michael S. Tsirkin [mailto:mst@redhat.com]
>>>>> > > > >Sent: Wednesday, November 30, 2016 9:41 PM
>>>>> > > > >To: wangyunjian
>>>>> > > > >Cc:jasowang@redhat.com;netdev@vger.kernel.org;linux-kernel@vger.kernel.org; caihe
>>>>> > > > >Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
>>>>> > > > >
>>>>> > > > >On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
>>>>>> > > > > >When we meet an error(err=-EBADFD) recvmsg,
>>>>> > > > >How do you get EBADFD? Won't vhost_net_rx_peek_head_len
>>>>> > > > >return 0 in this case, breaking the loop?
>>>> > > >We started many guest VMs while attaching/detaching some virtio-net nics for loop.
>>>> > > >The soft lockup might happened. The err is -EBADFD.
>>>> > > >
>>> > >OK, I'd like to figure out what happened here. why don't
>>> > >we get 0 when we peek at the head?
>>> > >
>>> > >EBADFD is from here:
>>> > >          struct tun_struct *tun = __tun_get(tfile);
>>> > >...
>>> > >          if (!tun)
>>> > >                  return -EBADFD;
>>> > >
>>> > >but then:
>>> > >static int tun_peek_len(struct socket *sock)
>>> > >{
>>> > >
>>> > >...
>>> > >
>>> > >          struct tun_struct *tun;
>>> > >...
>>> > >          tun = __tun_get(tfile);
>>> > >          if (!tun)
>>> > >                  return 0;
>>> > >
>>> > >
>>> > >so peek len should return 0.
>>> > >
>>> > >then while will exit:
>>> > >          while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk)))
>>> > >...
>>> > >
>> >
>> >Consider this case: user do ip link del link tap0 before recvmsg() but after
>> >tun_peek_len() ?
> Sure, this can happen, but I think we'll just exit on the next loop,
> won't we?
>

Right, this is the only case I can image for -EBADFD, let's wait for the 
author to the steps.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-12-01  3:37           ` Jason Wang
@ 2016-12-01  4:41             ` wangyunjian
  2016-12-01  4:56               ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: wangyunjian @ 2016-12-01  4:41 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin; +Cc: netdev, linux-kernel, caihe

>-----Original Message-----
>From: Jason Wang [mailto:jasowang@redhat.com] 
>Sent: Thursday, December 01, 2016 11:37 AM
>To: Michael S. Tsirkin
>Cc: wangyunjian; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; caihe
>Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
>
>
>
>On 2016年12月01日 11:27, Michael S. Tsirkin wrote:
>> On Thu, Dec 01, 2016 at 11:26:21AM +0800, Jason Wang wrote:
>>> >
>>> >
>>> >On 2016年12月01日 11:21, Michael S. Tsirkin wrote:
>>>> > >On Thu, Dec 01, 2016 at 02:48:59AM +0000, wangyunjian wrote:
>>>>>> > > > >-----Original Message-----
>>>>>> > > > >From: Michael S. Tsirkin [mailto:mst@redhat.com]
>>>>>> > > > >Sent: Wednesday, November 30, 2016 9:41 PM
>>>>>> > > > >To: wangyunjian
>>>>>> > > > >Cc:jasowang@redhat.com;netdev@vger.kernel.org;linux-kernel@
>>>>>> > > > >vger.kernel.org; caihe
>>>>>> > > > >Subject: Re: [PATCH net] vhost_net: don't continue to call 
>>>>>> > > > >the recvmsg when meet errors
>>>>>> > > > >
>>>>>> > > > >On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
>>>>>>> > > > > >When we meet an error(err=-EBADFD) recvmsg,
>>>>>> > > > >How do you get EBADFD? Won't vhost_net_rx_peek_head_len 
>>>>>> > > > >return 0 in this case, breaking the loop?
>>>>> > > >We started many guest VMs while attaching/detaching some virtio-net nics for loop.
>>>>> > > >The soft lockup might happened. The err is -EBADFD.
>>>>> > > >
>>>> > >OK, I'd like to figure out what happened here. why don't we get 0 
>>>> > >when we peek at the head?
>>>> > >
>>>> > >EBADFD is from here:
>>>> > >          struct tun_struct *tun = __tun_get(tfile); ...
>>>> > >          if (!tun)
>>>> > >                  return -EBADFD;
>>>> > >
>>>> > >but then:
>>>> > >static int tun_peek_len(struct socket *sock) {
>>>> > >
>>>> > >...
>>>> > >
>>>> > >          struct tun_struct *tun; ...
>>>> > >          tun = __tun_get(tfile);
>>>> > >          if (!tun)
>>>> > >                  return 0;
>>>> > >
>>>> > >
>>>> > >so peek len should return 0.
>>>> > >
>>>> > >then while will exit:
>>>> > >          while ((sock_len = vhost_net_rx_peek_head_len(net, 
>>>> > >sock->sk))) ...
>>>> > >
>>> >
>>> >Consider this case: user do ip link del link tap0 before recvmsg() 
>>> >but after
>>> >tun_peek_len() ?
>> Sure, this can happen, but I think we'll just exit on the next loop, 
>> won't we?
>>
>
>Right, this is the only case I can image for -EBADFD, let's wait for the author to the steps.
>

Thanks, I understand it don't happen in the latest kernel version.
My problem happened using kernel version 3.10.0-xx

The peek len willn't return 0.

static int peek_head_len(struct sock *sk)
{
	struct sk_buff *head;
	int len = 0;
	unsigned long flags;

	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
	head = skb_peek(&sk->sk_receive_queue);
	if (likely(head)) {
		len = head->len;
		if (skb_vlan_tag_present(head))
			len += VLAN_HLEN;
	}

	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
	return len;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
  2016-12-01  4:41             ` wangyunjian
@ 2016-12-01  4:56               ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2016-12-01  4:56 UTC (permalink / raw)
  To: wangyunjian; +Cc: Jason Wang, netdev, linux-kernel, caihe

On Thu, Dec 01, 2016 at 04:41:40AM +0000, wangyunjian wrote:
> >-----Original Message-----
> >From: Jason Wang [mailto:jasowang@redhat.com] 
> >Sent: Thursday, December 01, 2016 11:37 AM
> >To: Michael S. Tsirkin
> >Cc: wangyunjian; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; caihe
> >Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors
> >
> >
> >
> >On 2016年12月01日 11:27, Michael S. Tsirkin wrote:
> >> On Thu, Dec 01, 2016 at 11:26:21AM +0800, Jason Wang wrote:
> >>> >
> >>> >
> >>> >On 2016年12月01日 11:21, Michael S. Tsirkin wrote:
> >>>> > >On Thu, Dec 01, 2016 at 02:48:59AM +0000, wangyunjian wrote:
> >>>>>> > > > >-----Original Message-----
> >>>>>> > > > >From: Michael S. Tsirkin [mailto:mst@redhat.com]
> >>>>>> > > > >Sent: Wednesday, November 30, 2016 9:41 PM
> >>>>>> > > > >To: wangyunjian
> >>>>>> > > > >Cc:jasowang@redhat.com;netdev@vger.kernel.org;linux-kernel@
> >>>>>> > > > >vger.kernel.org; caihe
> >>>>>> > > > >Subject: Re: [PATCH net] vhost_net: don't continue to call 
> >>>>>> > > > >the recvmsg when meet errors
> >>>>>> > > > >
> >>>>>> > > > >On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote:
> >>>>>>> > > > > >When we meet an error(err=-EBADFD) recvmsg,
> >>>>>> > > > >How do you get EBADFD? Won't vhost_net_rx_peek_head_len 
> >>>>>> > > > >return 0 in this case, breaking the loop?
> >>>>> > > >We started many guest VMs while attaching/detaching some virtio-net nics for loop.
> >>>>> > > >The soft lockup might happened. The err is -EBADFD.
> >>>>> > > >
> >>>> > >OK, I'd like to figure out what happened here. why don't we get 0 
> >>>> > >when we peek at the head?
> >>>> > >
> >>>> > >EBADFD is from here:
> >>>> > >          struct tun_struct *tun = __tun_get(tfile); ...
> >>>> > >          if (!tun)
> >>>> > >                  return -EBADFD;
> >>>> > >
> >>>> > >but then:
> >>>> > >static int tun_peek_len(struct socket *sock) {
> >>>> > >
> >>>> > >...
> >>>> > >
> >>>> > >          struct tun_struct *tun; ...
> >>>> > >          tun = __tun_get(tfile);
> >>>> > >          if (!tun)
> >>>> > >                  return 0;
> >>>> > >
> >>>> > >
> >>>> > >so peek len should return 0.
> >>>> > >
> >>>> > >then while will exit:
> >>>> > >          while ((sock_len = vhost_net_rx_peek_head_len(net, 
> >>>> > >sock->sk))) ...
> >>>> > >
> >>> >
> >>> >Consider this case: user do ip link del link tap0 before recvmsg() 
> >>> >but after
> >>> >tun_peek_len() ?
> >> Sure, this can happen, but I think we'll just exit on the next loop, 
> >> won't we?
> >>
> >
> >Right, this is the only case I can image for -EBADFD, let's wait for the author to the steps.
> >
> 
> Thanks, I understand it don't happen in the latest kernel version.
> My problem happened using kernel version 3.10.0-xx
> The peek len willn't return 0.
> 
> static int peek_head_len(struct sock *sk)
> {
> 	struct sk_buff *head;
> 	int len = 0;
> 	unsigned long flags;
> 
> 	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
> 	head = skb_peek(&sk->sk_receive_queue);

Should return NULL, should it not?
Maybe sk_receive_queue was not purged on detach back then.


> 	if (likely(head)) {
> 		len = head->len;
> 		if (skb_vlan_tag_present(head))
> 			len += VLAN_HLEN;
> 	}
> 
> 	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
> 	return len;
> }

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-12-01  4:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-30 12:10 [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors Yunjian Wang
2016-11-30 13:07 ` Jason Wang
2016-11-30 13:43   ` Michael S. Tsirkin
2016-11-30 13:40 ` Michael S. Tsirkin
2016-12-01  2:48   ` wangyunjian
2016-12-01  3:13     ` Jason Wang
2016-12-01  3:21     ` Michael S. Tsirkin
2016-12-01  3:26       ` Jason Wang
2016-12-01  3:27         ` Michael S. Tsirkin
2016-12-01  3:37           ` Jason Wang
2016-12-01  4:41             ` wangyunjian
2016-12-01  4:56               ` Michael S. Tsirkin
2016-12-01  3:15   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).