All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] virtio_ring: fix packed ring event may missing
       [not found] <20191021171004.18729-1-yong.liu@intel.com>
@ 2019-10-22  2:44 ` Jason Wang
       [not found]   ` <86228AFD5BCD8E4EBFD2B90117B5E81E633D74EF@SHSMSX103.ccr.corp.intel.com>
  2019-10-25  9:32 ` Michael S. Tsirkin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Jason Wang @ 2019-10-22  2:44 UTC (permalink / raw)
  To: Marvin Liu, mst, tiwei.bie; +Cc: virtualization


On 2019/10/22 上午1:10, Marvin Liu wrote:
> When callback is delayed, virtio expect that vhost will kick when
> rolling over event offset. Recheck should be taken as used index may
> exceed event offset between status check and driver event update.
>
> However, it is possible that flags was not modified if descriptors are
> chained or in_order feature was negotiated. So flags at event offset
> may not be valid for descriptor's status checking. Fix it by using last
> used index as replacement. Tx queue will be stopped if there's not
> enough freed buffers after recheck.
>
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index bdc08244a648..a8041e451e9e 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1499,9 +1499,6 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>   		 * counter first before updating event flags.
>   		 */
>   		virtio_wmb(vq->weak_barriers);
> -	} else {
> -		used_idx = vq->last_used_idx;
> -		wrap_counter = vq->packed.used_wrap_counter;
>   	}
>   
>   	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) {
> @@ -1518,7 +1515,9 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>   	 */
>   	virtio_mb(vq->weak_barriers);
>   
> -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> +	if (is_used_desc_packed(vq,
> +				vq->last_used_idx,
> +				vq->packed.used_wrap_counter)) {
>   		END_USE(vq);
>   		return false;
>   	}


Hi Marvin:

Two questions:

1) Do we support IN_ORDER in kernel driver?

2) Should we check IN_ORDER in this case otherwise we may end up with 
interrupt storm when IN_ORDER is not negotiated?

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] virtio_ring: fix packed ring event may missing
       [not found]   ` <86228AFD5BCD8E4EBFD2B90117B5E81E633D74EF@SHSMSX103.ccr.corp.intel.com>
@ 2019-10-22 13:05     ` Jason Wang
       [not found]       ` <86228AFD5BCD8E4EBFD2B90117B5E81E633DA298@SHSMSX103.ccr.corp.intel.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Wang @ 2019-10-22 13:05 UTC (permalink / raw)
  To: Liu, Yong, mst, Bie, Tiwei; +Cc: virtualization


On 2019/10/22 下午2:48, Liu, Yong wrote:
> Hi Jason,
> My answers are inline.
>
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Tuesday, October 22, 2019 10:45 AM
>> To: Liu, Yong <yong.liu@intel.com>; mst@redhat.com; Bie, Tiwei
>> <tiwei.bie@intel.com>
>> Cc: virtualization@lists.linux-foundation.org
>> Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing
>>
>>
>> On 2019/10/22 上午1:10, Marvin Liu wrote:
>>> When callback is delayed, virtio expect that vhost will kick when
>>> rolling over event offset. Recheck should be taken as used index may
>>> exceed event offset between status check and driver event update.
>>>
>>> However, it is possible that flags was not modified if descriptors are
>>> chained or in_order feature was negotiated. So flags at event offset
>>> may not be valid for descriptor's status checking. Fix it by using last
>>> used index as replacement. Tx queue will be stopped if there's not
>>> enough freed buffers after recheck.
>>>
>>> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index bdc08244a648..a8041e451e9e 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -1499,9 +1499,6 @@ static bool
>> virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>>>    		 * counter first before updating event flags.
>>>    		 */
>>>    		virtio_wmb(vq->weak_barriers);
>>> -	} else {
>>> -		used_idx = vq->last_used_idx;
>>> -		wrap_counter = vq->packed.used_wrap_counter;
>>>    	}
>>>
>>>    	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE)
>> {
>>> @@ -1518,7 +1515,9 @@ static bool
>> virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>>>    	 */
>>>    	virtio_mb(vq->weak_barriers);
>>>
>>> -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
>>> +	if (is_used_desc_packed(vq,
>>> +				vq->last_used_idx,
>>> +				vq->packed.used_wrap_counter)) {
>>>    		END_USE(vq);
>>>    		return false;
>>>    	}
>>
>> Hi Marvin:
>>
>> Two questions:
>>
>> 1) Do we support IN_ORDER in kernel driver?
>>
> Not support by now. But issue still can be possible if in_direct disabled and meanwhile descs are chained.
> Due to packed ring desc status should check one by one, chose arbitrary position may cause issue.


Right, then it's better to mention IN_ORDER as future features.


>
>> 2) Should we check IN_ORDER in this case otherwise we may end up with
>> interrupt storm when IN_ORDER is not negotiated?
> Interrupt number will not increase here, event offset value calculated as before.
> Here just recheck whether new used descs is enough for next around xmit.
> If backend was slow, most likely Tx queue will sleep for a while until used index go over event offset.


Ok, but what if the backend is almost as fast as guest driver? E.g in 
virtio-net we had:

     if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
         netif_stop_subqueue(dev, qnum);
         if (!use_napi &&
             unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
             /* More just got used, free them then recheck. */
             free_old_xmit_skbs(sq, false);
             if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
                 netif_start_subqueue(dev, qnum);
                 virtqueue_disable_cb(sq->vq);
             }
         }
     }

I worry that we may end up with toggling queue state in the case 
(sq->vq->num_free is near 2 + MAX_SKB_FRAGS).

It looks to me the correct thing to implement is to calculate the head 
descriptor of a chain that sits at 3/4.

Thanks


>
> Thanks,
> Marvin
>
>> Thanks
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] virtio_ring: fix packed ring event may missing
       [not found]       ` <86228AFD5BCD8E4EBFD2B90117B5E81E633DA298@SHSMSX103.ccr.corp.intel.com>
@ 2019-10-24  3:50         ` Jason Wang
  2019-10-27  9:54           ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Wang @ 2019-10-24  3:50 UTC (permalink / raw)
  To: Liu, Yong, mst, Bie, Tiwei; +Cc: virtualization


On 2019/10/24 上午11:26, Liu, Yong wrote:
>
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Tuesday, October 22, 2019 9:06 PM
>> To: Liu, Yong <yong.liu@intel.com>; mst@redhat.com; Bie, Tiwei
>> <tiwei.bie@intel.com>
>> Cc: virtualization@lists.linux-foundation.org
>> Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing
>>
>>
>> On 2019/10/22 下午2:48, Liu, Yong wrote:
>>> Hi Jason,
>>> My answers are inline.
>>>
>>>> -----Original Message-----
>>>> From: Jason Wang [mailto:jasowang@redhat.com]
>>>> Sent: Tuesday, October 22, 2019 10:45 AM
>>>> To: Liu, Yong <yong.liu@intel.com>; mst@redhat.com; Bie, Tiwei
>>>> <tiwei.bie@intel.com>
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing
>>>>
>>>>
>>>> On 2019/10/22 上午1:10, Marvin Liu wrote:
>>>>> When callback is delayed, virtio expect that vhost will kick when
>>>>> rolling over event offset. Recheck should be taken as used index may
>>>>> exceed event offset between status check and driver event update.
>>>>>
>>>>> However, it is possible that flags was not modified if descriptors are
>>>>> chained or in_order feature was negotiated. So flags at event offset
>>>>> may not be valid for descriptor's status checking. Fix it by using last
>>>>> used index as replacement. Tx queue will be stopped if there's not
>>>>> enough freed buffers after recheck.
>>>>>
>>>>> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>>>>>
>>>>> diff --git a/drivers/virtio/virtio_ring.c
>> b/drivers/virtio/virtio_ring.c
>>>>> index bdc08244a648..a8041e451e9e 100644
>>>>> --- a/drivers/virtio/virtio_ring.c
>>>>> +++ b/drivers/virtio/virtio_ring.c
>>>>> @@ -1499,9 +1499,6 @@ static bool
>>>> virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>>>>>     		 * counter first before updating event flags.
>>>>>     		 */
>>>>>     		virtio_wmb(vq->weak_barriers);
>>>>> -	} else {
>>>>> -		used_idx = vq->last_used_idx;
>>>>> -		wrap_counter = vq->packed.used_wrap_counter;
>>>>>     	}
>>>>>
>>>>>     	if (vq->packed.event_flags_shadow ==
>> VRING_PACKED_EVENT_FLAG_DISABLE)
>>>> {
>>>>> @@ -1518,7 +1515,9 @@ static bool
>>>> virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>>>>>     	 */
>>>>>     	virtio_mb(vq->weak_barriers);
>>>>>
>>>>> -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
>>>>> +	if (is_used_desc_packed(vq,
>>>>> +				vq->last_used_idx,
>>>>> +				vq->packed.used_wrap_counter)) {
>>>>>     		END_USE(vq);
>>>>>     		return false;
>>>>>     	}
>>>> Hi Marvin:
>>>>
>>>> Two questions:
>>>>
>>>> 1) Do we support IN_ORDER in kernel driver?
>>>>
>>> Not support by now. But issue still can be possible if in_direct disabled
>> and meanwhile descs are chained.
>>> Due to packed ring desc status should check one by one, chose arbitrary
>> position may cause issue.
>>
>>
>> Right, then it's better to mention IN_ORDER as future features.
>>
>>
>>>> 2) Should we check IN_ORDER in this case otherwise we may end up with
>>>> interrupt storm when IN_ORDER is not negotiated?
>>> Interrupt number will not increase here, event offset value calculated as
>> before.
>>> Here just recheck whether new used descs is enough for next around xmit.
>>> If backend was slow, most likely Tx queue will sleep for a while until
>> used index go over event offset.
>>
>>
>> Ok, but what if the backend is almost as fast as guest driver? E.g in
>> virtio-net we had:
>>
>>       if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
>>           netif_stop_subqueue(dev, qnum);
>>           if (!use_napi &&
>>               unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>>               /* More just got used, free them then recheck. */
>>               free_old_xmit_skbs(sq, false);
>>               if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>>                   netif_start_subqueue(dev, qnum);
>>                   virtqueue_disable_cb(sq->vq);
>>               }
>>           }
>>       }
>>
>> I worry that we may end up with toggling queue state in the case
>> (sq->vq->num_free is near 2 + MAX_SKB_FRAGS).
>>
> Yes, at this worst case each packet will add extra twice event flags write. Due to backend only read this value, the cost won't too much.


For driver, it means extra overheads, atomics, less batching, stats 
updating etc. For backend, cacheline will bounce between two cpus.


> Even we can track down chained descs and figure out whether event offset indexed desc is used. There's still possibility that flags is invalid.
> One case is that backend can buffer multiple descs by not updating the first one. We cannot guarantee that later flags is usable until check from the first one.


In this case, since we've stopped tx queue, so there's no new buffers 
added. It doesn't matter we get notified when the 3/4 or all of the 
descriptors has been used.

Thanks


>
> Regards,
> Marvin
>
>> It looks to me the correct thing to implement is to calculate the head
>> descriptor of a chain that sits at 3/4.
>>
>> Thanks
>>
>>
>>> Thanks,
>>> Marvin
>>>
>>>> Thanks
>>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] virtio_ring: fix packed ring event may missing
       [not found] <20191021171004.18729-1-yong.liu@intel.com>
  2019-10-22  2:44 ` [PATCH] virtio_ring: fix packed ring event may missing Jason Wang
@ 2019-10-25  9:32 ` Michael S. Tsirkin
  2019-10-27  9:09   ` Michael S. Tsirkin
  2019-10-25 10:53 ` Michael S. Tsirkin
  2019-10-27  9:51 ` Michael S. Tsirkin
  3 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2019-10-25  9:32 UTC (permalink / raw)
  To: Marvin Liu; +Cc: virtualization

On Tue, Oct 22, 2019 at 01:10:04AM +0800, Marvin Liu wrote:
> When callback is delayed, virtio expect that vhost will kick when
> rolling over event offset. Recheck should be taken as used index may
> exceed event offset between status check and driver event update.
> 
> However, it is possible that flags was not modified if descriptors are
> chained or in_order feature was negotiated. So flags at event offset

This mention of event offset I don't understand: your patch
only affects code that runs when !event. So how can it
affect event offset?



> may not be valid for descriptor's status checking. Fix it by using last
> used index as replacement. Tx queue will be stopped if there's not
> enough freed buffers after recheck.
> 
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index bdc08244a648..a8041e451e9e 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1499,9 +1499,6 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  		 * counter first before updating event flags.
>  		 */
>  		virtio_wmb(vq->weak_barriers);
> -	} else {
> -		used_idx = vq->last_used_idx;
> -		wrap_counter = vq->packed.used_wrap_counter;
>  	}


Is all this theorectical? Or did you actually see a problem
and then fixed it?
Because as far as I could see after this patch and with
event index off, used_idx and wrap_counter will be used
without being initialized.

OTOH the behaviour with event index on is completely unaffected.


>  
>  	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) {

OK so trying to unpack the scenario.

First you patch only affects code running when EVENT_IDX is off, so
legal values for flags are enable and disable.


Next point, this calculates the index at which we are going
to look for the flags to change, in other words
it affects the line
        if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
below.

Without your patch, we simply look at the next descriptor.
This is exactly what the spec says we should do:

	Writes of device and driver descriptors can generally be
	reordered, but each side (driver and device) are only required to
	poll (or test) a single location in memory: the next device descriptor after
	the one they processed previously, in circular order.






> @@ -1518,7 +1515,9 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  	 */
>  	virtio_mb(vq->weak_barriers);
>  
> -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> +	if (is_used_desc_packed(vq,
> +				vq->last_used_idx,
> +				vq->packed.used_wrap_counter)) {
>  		END_USE(vq);
>  		return false;
>  	}
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] virtio_ring: fix packed ring event may missing
       [not found] <20191021171004.18729-1-yong.liu@intel.com>
  2019-10-22  2:44 ` [PATCH] virtio_ring: fix packed ring event may missing Jason Wang
  2019-10-25  9:32 ` Michael S. Tsirkin
@ 2019-10-25 10:53 ` Michael S. Tsirkin
  2019-10-27  9:12   ` Michael S. Tsirkin
  2019-10-27  9:51 ` Michael S. Tsirkin
  3 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2019-10-25 10:53 UTC (permalink / raw)
  To: Marvin Liu; +Cc: virtualization

On Tue, Oct 22, 2019 at 01:10:04AM +0800, Marvin Liu wrote:
> When callback is delayed, virtio expect that vhost will kick when
> rolling over event offset. Recheck should be taken as used index may
> exceed event offset between status check and driver event update.
> 
> However, it is possible that flags was not modified if descriptors are
> chained or in_order feature was negotiated. So flags at event offset
> may not be valid for descriptor's status checking. Fix it by using last
> used index as replacement. Tx queue will be stopped if there's not
> enough freed buffers after recheck.
> 
> Signed-off-by: Marvin Liu <yong.liu@intel.com>

So as I wrote separately, I don't think the patch is correct.  And it
doesn't look like it was tested properly.  However, I do think
virtqueue_enable_cb_delayed_packed is implemented incorrectly in case of
chained s/g and ring that is close to being full.  It's just that
chained only happens on OOM at the moment, so it's hard to trigger.
virtio_ring.c can be hacked to force chained to make it trigger more.
And with napi_tx the default this function is barely used at all.

So it's hard to test, and I think in this area we really should include
info on how patch was tested before applying.


But just theoretically speaking I agree: the trick
virtqueue_enable_cb_delayed_packed is using to check that 3/4 of the
ring has been used just by looking at a single s/g entry can't work
reliably.

We generally really need to scan the ring.  How much to scan?  If we
want to implement this efficiently, I propose that we track the number
of outstanding bufs.  Then # of s/g - # of bufs is the max number of
entries we need to scan. For the common case of a single s/g this will
give exactly 0.


Alternatively, we can just scan the whole ring up to the index using the
standard logic. virtqueue_enable_cb_delayed_packed is rare enough.



> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index bdc08244a648..a8041e451e9e 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1499,9 +1499,6 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  		 * counter first before updating event flags.
>  		 */
>  		virtio_wmb(vq->weak_barriers);
> -	} else {
> -		used_idx = vq->last_used_idx;
> -		wrap_counter = vq->packed.used_wrap_counter;
>  	}
>  
>  	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) {
> @@ -1518,7 +1515,9 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  	 */
>  	virtio_mb(vq->weak_barriers);
>  
> -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> +	if (is_used_desc_packed(vq,
> +				vq->last_used_idx,
> +				vq->packed.used_wrap_counter)) {
>  		END_USE(vq);
>  		return false;
>  	}
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] virtio_ring: fix packed ring event may missing
  2019-10-25  9:32 ` Michael S. Tsirkin
@ 2019-10-27  9:09   ` Michael S. Tsirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2019-10-27  9:09 UTC (permalink / raw)
  To: Marvin Liu; +Cc: virtualization

On Fri, Oct 25, 2019 at 05:32:49AM -0400, Michael S. Tsirkin wrote:
> On Tue, Oct 22, 2019 at 01:10:04AM +0800, Marvin Liu wrote:
> > When callback is delayed, virtio expect that vhost will kick when
> > rolling over event offset. Recheck should be taken as used index may
> > exceed event offset between status check and driver event update.
> > 
> > However, it is possible that flags was not modified if descriptors are
> > chained or in_order feature was negotiated. So flags at event offset
> 
> This mention of event offset I don't understand: your patch
> only affects code that runs when !event. So how can it
> affect event offset?
> 
> 
> 
> > may not be valid for descriptor's status checking. Fix it by using last
> > used index as replacement. Tx queue will be stopped if there's not
> > enough freed buffers after recheck.
> > 
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index bdc08244a648..a8041e451e9e 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1499,9 +1499,6 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >  		 * counter first before updating event flags.
> >  		 */
> >  		virtio_wmb(vq->weak_barriers);
> > -	} else {
> > -		used_idx = vq->last_used_idx;
> > -		wrap_counter = vq->packed.used_wrap_counter;
> >  	}
> 
> 
> Is all this theorectical? Or did you actually see a problem
> and then fixed it?
> Because as far as I could see after this patch and with
> event index off, used_idx and wrap_counter will be used
> without being initialized.
> 
> OTOH the behaviour with event index on is completely unaffected.
> 
> 
> >  
> >  	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) {
> 
> OK so trying to unpack the scenario.
> 
> First you patch only affects code running when EVENT_IDX is off, so
> legal values for flags are enable and disable.
> 
> 
> Next point, this calculates the index at which we are going
> to look for the flags to change, in other words
> it affects the line
>         if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> below.
> 
> Without your patch, we simply look at the next descriptor.
> This is exactly what the spec says we should do:
> 
> 	Writes of device and driver descriptors can generally be
> 	reordered, but each side (driver and device) are only required to
> 	poll (or test) a single location in memory: the next device descriptor after
> 	the one they processed previously, in circular order.
> 

OK please ignore all this, I misunderstood the patch. Sorry about the
noise.


> 
> 
> 
> 
> > @@ -1518,7 +1515,9 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >  	 */
> >  	virtio_mb(vq->weak_barriers);
> >  
> > -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> > +	if (is_used_desc_packed(vq,
> > +				vq->last_used_idx,
> > +				vq->packed.used_wrap_counter)) {
> >  		END_USE(vq);
> >  		return false;
> >  	}
> > -- 
> > 2.17.1
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] virtio_ring: fix packed ring event may missing
  2019-10-25 10:53 ` Michael S. Tsirkin
@ 2019-10-27  9:12   ` Michael S. Tsirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2019-10-27  9:12 UTC (permalink / raw)
  To: Marvin Liu; +Cc: virtualization

On Fri, Oct 25, 2019 at 06:53:55AM -0400, Michael S. Tsirkin wrote:
> On Tue, Oct 22, 2019 at 01:10:04AM +0800, Marvin Liu wrote:
> > When callback is delayed, virtio expect that vhost will kick when
> > rolling over event offset. Recheck should be taken as used index may
> > exceed event offset between status check and driver event update.
> > 
> > However, it is possible that flags was not modified if descriptors are
> > chained or in_order feature was negotiated. So flags at event offset
> > may not be valid for descriptor's status checking. Fix it by using last
> > used index as replacement. Tx queue will be stopped if there's not
> > enough freed buffers after recheck.
> > 
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> 
> So as I wrote separately, I don't think the patch is correct.  And it
> doesn't look like it was tested properly.

I was wrong to say this: the patch is correct.

Thanks for working on this, and sorry about creating the confusion.

It might not be the
optimal thing to do but it fixes a real bug, so I think it's
a good idea to merge it for now, and implement optimizations on top.

>  However, I do think
> virtqueue_enable_cb_delayed_packed is implemented incorrectly in case of
> chained s/g and ring that is close to being full.  It's just that
> chained only happens on OOM at the moment, so it's hard to trigger.
> virtio_ring.c can be hacked to force chained to make it trigger more.
> And with napi_tx the default this function is barely used at all.
> 
> So it's hard to test, and I think in this area we really should include
> info on how patch was tested before applying.
> 
> 
> But just theoretically speaking I agree: the trick
> virtqueue_enable_cb_delayed_packed is using to check that 3/4 of the
> ring has been used just by looking at a single s/g entry can't work
> reliably.
> 
> We generally really need to scan the ring.  How much to scan?  If we
> want to implement this efficiently, I propose that we track the number
> of outstanding bufs.  Then # of s/g - # of bufs is the max number of
> entries we need to scan. For the common case of a single s/g this will
> give exactly 0.
> 
> 
> Alternatively, we can just scan the whole ring up to the index using the
> standard logic. virtqueue_enable_cb_delayed_packed is rare enough.
> 
> 
> 
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index bdc08244a648..a8041e451e9e 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1499,9 +1499,6 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >  		 * counter first before updating event flags.
> >  		 */
> >  		virtio_wmb(vq->weak_barriers);
> > -	} else {
> > -		used_idx = vq->last_used_idx;
> > -		wrap_counter = vq->packed.used_wrap_counter;
> >  	}
> >  
> >  	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) {
> > @@ -1518,7 +1515,9 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> >  	 */
> >  	virtio_mb(vq->weak_barriers);
> >  
> > -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> > +	if (is_used_desc_packed(vq,
> > +				vq->last_used_idx,
> > +				vq->packed.used_wrap_counter)) {
> >  		END_USE(vq);
> >  		return false;
> >  	}
> > -- 
> > 2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] virtio_ring: fix packed ring event may missing
       [not found] <20191021171004.18729-1-yong.liu@intel.com>
                   ` (2 preceding siblings ...)
  2019-10-25 10:53 ` Michael S. Tsirkin
@ 2019-10-27  9:51 ` Michael S. Tsirkin
  3 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2019-10-27  9:51 UTC (permalink / raw)
  To: Marvin Liu; +Cc: virtualization

On Tue, Oct 22, 2019 at 01:10:04AM +0800, Marvin Liu wrote:
> When callback is delayed, virtio expect that vhost will kick when
> rolling over event offset. Recheck should be taken as used index may
> exceed event offset between status check and driver event update.
> 
> However, it is possible that flags was not modified if descriptors are
> chained or in_order feature was negotiated. So flags at event offset
> may not be valid for descriptor's status checking. Fix it by using last
> used index as replacement. Tx queue will be stopped if there's not
> enough freed buffers after recheck.
> 
> Signed-off-by: Marvin Liu <yong.liu@intel.com>

OK I rewrote the commit log slightly:
	When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can
use virtqueue_enable_cb_delayed_packed to reduce the number of device
interrupts.  At the moment, this is the case for virtio-net when
the napi_tx module parameter is set to false.

In this case, the virtio driver selects an event offset in the ring and expects
that the device will send a notification when rolling over the event
offset in the ring.  However, if this roll-over happens before the event
suppression structure update, the notification won't be sent. To address
this race condition the driver needs to check wether the
device rolled over this offset after updating the event suppression structure.

With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the the
flags field at the specified offset in the descriptor.

	Unfortunately, checking at the event offset isn't reliable: if
descriptors are chained (e.g. when INDIRECT is off) not all descriptors
are overwritten by the device, so it's possible that the device skipped
the specific descriptor driver is checking when writing out used
descriptors. If this happens, the driver won't detect the race condition and will
incorrectly expect the device to send a notification.

For virtio-net, the result will be TX queue stall, and transmission
getting blocked forever.

	With the packed ring, it isn't easy to find a location which is
guaranteed to change upon the roll-over, except the next device
descriptor, as described in the spec:

	Writes of device and driver descriptors can generally be
	reordered, but each side (driver and device) are only required to
	poll (or test) a single location in memory: the next device descriptor after
	the one they processed previously, in circular order.

while this might be sub-optimal, let's do exactly this for now.

And applied this.

Thanks a lot for working on this, and sorry again for not understanding
the patch originally and thinking it was not tested!

> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index bdc08244a648..a8041e451e9e 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1499,9 +1499,6 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  		 * counter first before updating event flags.
>  		 */
>  		virtio_wmb(vq->weak_barriers);
> -	} else {
> -		used_idx = vq->last_used_idx;
> -		wrap_counter = vq->packed.used_wrap_counter;
>  	}
>  
>  	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) {
> @@ -1518,7 +1515,9 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  	 */
>  	virtio_mb(vq->weak_barriers);
>  
> -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> +	if (is_used_desc_packed(vq,
> +				vq->last_used_idx,
> +				vq->packed.used_wrap_counter)) {
>  		END_USE(vq);
>  		return false;
>  	}
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] virtio_ring: fix packed ring event may missing
  2019-10-24  3:50         ` Jason Wang
@ 2019-10-27  9:54           ` Michael S. Tsirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2019-10-27  9:54 UTC (permalink / raw)
  To: Jason Wang; +Cc: Liu, Yong, virtualization

On Thu, Oct 24, 2019 at 11:50:51AM +0800, Jason Wang wrote:
> 
> On 2019/10/24 上午11:26, Liu, Yong wrote:
> > 
> > > -----Original Message-----
> > > From: Jason Wang [mailto:jasowang@redhat.com]
> > > Sent: Tuesday, October 22, 2019 9:06 PM
> > > To: Liu, Yong <yong.liu@intel.com>; mst@redhat.com; Bie, Tiwei
> > > <tiwei.bie@intel.com>
> > > Cc: virtualization@lists.linux-foundation.org
> > > Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing
> > > 
> > > 
> > > On 2019/10/22 下午2:48, Liu, Yong wrote:
> > > > Hi Jason,
> > > > My answers are inline.
> > > > 
> > > > > -----Original Message-----
> > > > > From: Jason Wang [mailto:jasowang@redhat.com]
> > > > > Sent: Tuesday, October 22, 2019 10:45 AM
> > > > > To: Liu, Yong <yong.liu@intel.com>; mst@redhat.com; Bie, Tiwei
> > > > > <tiwei.bie@intel.com>
> > > > > Cc: virtualization@lists.linux-foundation.org
> > > > > Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing
> > > > > 
> > > > > 
> > > > > On 2019/10/22 上午1:10, Marvin Liu wrote:
> > > > > > When callback is delayed, virtio expect that vhost will kick when
> > > > > > rolling over event offset. Recheck should be taken as used index may
> > > > > > exceed event offset between status check and driver event update.
> > > > > > 
> > > > > > However, it is possible that flags was not modified if descriptors are
> > > > > > chained or in_order feature was negotiated. So flags at event offset
> > > > > > may not be valid for descriptor's status checking. Fix it by using last
> > > > > > used index as replacement. Tx queue will be stopped if there's not
> > > > > > enough freed buffers after recheck.
> > > > > > 
> > > > > > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> > > > > > 
> > > > > > diff --git a/drivers/virtio/virtio_ring.c
> > > b/drivers/virtio/virtio_ring.c
> > > > > > index bdc08244a648..a8041e451e9e 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -1499,9 +1499,6 @@ static bool
> > > > > virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > > >     		 * counter first before updating event flags.
> > > > > >     		 */
> > > > > >     		virtio_wmb(vq->weak_barriers);
> > > > > > -	} else {
> > > > > > -		used_idx = vq->last_used_idx;
> > > > > > -		wrap_counter = vq->packed.used_wrap_counter;
> > > > > >     	}
> > > > > > 
> > > > > >     	if (vq->packed.event_flags_shadow ==
> > > VRING_PACKED_EVENT_FLAG_DISABLE)
> > > > > {
> > > > > > @@ -1518,7 +1515,9 @@ static bool
> > > > > virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > > > > >     	 */
> > > > > >     	virtio_mb(vq->weak_barriers);
> > > > > > 
> > > > > > -	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> > > > > > +	if (is_used_desc_packed(vq,
> > > > > > +				vq->last_used_idx,
> > > > > > +				vq->packed.used_wrap_counter)) {
> > > > > >     		END_USE(vq);
> > > > > >     		return false;
> > > > > >     	}
> > > > > Hi Marvin:
> > > > > 
> > > > > Two questions:
> > > > > 
> > > > > 1) Do we support IN_ORDER in kernel driver?
> > > > > 
> > > > Not support by now. But issue still can be possible if in_direct disabled
> > > and meanwhile descs are chained.
> > > > Due to packed ring desc status should check one by one, chose arbitrary
> > > position may cause issue.
> > > 
> > > 
> > > Right, then it's better to mention IN_ORDER as future features.
> > > 
> > > 
> > > > > 2) Should we check IN_ORDER in this case otherwise we may end up with
> > > > > interrupt storm when IN_ORDER is not negotiated?
> > > > Interrupt number will not increase here, event offset value calculated as
> > > before.
> > > > Here just recheck whether new used descs is enough for next around xmit.
> > > > If backend was slow, most likely Tx queue will sleep for a while until
> > > used index go over event offset.
> > > 
> > > 
> > > Ok, but what if the backend is almost as fast as guest driver? E.g in
> > > virtio-net we had:
> > > 
> > >       if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> > >           netif_stop_subqueue(dev, qnum);
> > >           if (!use_napi &&
> > >               unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> > >               /* More just got used, free them then recheck. */
> > >               free_old_xmit_skbs(sq, false);
> > >               if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> > >                   netif_start_subqueue(dev, qnum);
> > >                   virtqueue_disable_cb(sq->vq);
> > >               }
> > >           }
> > >       }
> > > 
> > > I worry that we may end up with toggling queue state in the case
> > > (sq->vq->num_free is near 2 + MAX_SKB_FRAGS).
> > > 
> > Yes, at this worst case each packet will add extra twice event flags write. Due to backend only read this value, the cost won't too much.
> 
> 
> For driver, it means extra overheads, atomics, less batching, stats updating
> etc. For backend, cacheline will bounce between two cpus.
> 
> 
> > Even we can track down chained descs and figure out whether event offset indexed desc is used. There's still possibility that flags is invalid.
> > One case is that backend can buffer multiple descs by not updating the first one. We cannot guarantee that later flags is usable until check from the first one.
> 
> 
> In this case, since we've stopped tx queue, so there's no new buffers added.
> It doesn't matter we get notified when the 3/4 or all of the descriptors has
> been used.
> 
> Thanks

Well - checking the next descriptor will likely result in moving the
event index forward, which will thinkably reduce the # of interrupts.
So it's hard to predict which is better.  I'll apply the patch for now
as it's simple and safe.  If someone has the time to work on tuning all
this, that would be great.


> 
> > 
> > Regards,
> > Marvin
> > 
> > > It looks to me the correct thing to implement is to calculate the head
> > > descriptor of a chain that sits at 3/4.
> > > 
> > > Thanks
> > > 
> > > 
> > > > Thanks,
> > > > Marvin
> > > > 
> > > > > Thanks
> > > > > 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-10-27  9:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191021171004.18729-1-yong.liu@intel.com>
2019-10-22  2:44 ` [PATCH] virtio_ring: fix packed ring event may missing Jason Wang
     [not found]   ` <86228AFD5BCD8E4EBFD2B90117B5E81E633D74EF@SHSMSX103.ccr.corp.intel.com>
2019-10-22 13:05     ` Jason Wang
     [not found]       ` <86228AFD5BCD8E4EBFD2B90117B5E81E633DA298@SHSMSX103.ccr.corp.intel.com>
2019-10-24  3:50         ` Jason Wang
2019-10-27  9:54           ` Michael S. Tsirkin
2019-10-25  9:32 ` Michael S. Tsirkin
2019-10-27  9:09   ` Michael S. Tsirkin
2019-10-25 10:53 ` Michael S. Tsirkin
2019-10-27  9:12   ` Michael S. Tsirkin
2019-10-27  9:51 ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.