* [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
2014-10-15 7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
@ 2014-10-15 7:25 ` Jason Wang
2014-10-15 9:34 ` Michael S. Tsirkin
2014-10-15 7:25 ` [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail() Jason Wang
` (5 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 7:25 UTC (permalink / raw)
To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet
This patch checks the new event idx to make sure used event idx never
goes back. This is used to synchronize the calls between
virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/virtio/virtio_ring.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3b1f89b..1b3929f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
u16 last_used_idx;
START_USE(vq);
-
+ last_used_idx = vq->last_used_idx;
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
* either clear the flags bit or point the event index at the next
* entry. Always do both to keep code simple. */
vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
- vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
+ /* Make sure used event never go backwards */
+ if (!vring_need_event(vring_used_event(&vq->vring),
+ vq->vring.avail->idx, last_used_idx))
+ vring_used_event(&vq->vring) = last_used_idx;
END_USE(vq);
return last_used_idx;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
2014-10-15 7:25 ` [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards Jason Wang
@ 2014-10-15 9:34 ` Michael S. Tsirkin
2014-10-15 10:13 ` Jason Wang
0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 9:34 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
> This patch checks the new event idx to make sure used event idx never
> goes back. This is used to synchronize the calls between
> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
the implication being that moving event idx back might cause some race
condition? If yes but please describe the race explicitly.
Is there a bug we need to fix on stable?
Please also explicitly describe a configuration that causes event idx
to go back.
All this info should go in the commit log.
> ---
> drivers/virtio/virtio_ring.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 3b1f89b..1b3929f 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> u16 last_used_idx;
>
> START_USE(vq);
> -
> + last_used_idx = vq->last_used_idx;
> /* We optimistically turn back on interrupts, then check if there was
> * more to do. */
> /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> * either clear the flags bit or point the event index at the next
> * entry. Always do both to keep code simple. */
> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> - vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
> + /* Make sure used event never go backwards */
s/go/goes/
> + if (!vring_need_event(vring_used_event(&vq->vring),
> + vq->vring.avail->idx, last_used_idx))
> + vring_used_event(&vq->vring) = last_used_idx;
The result will be that driver will *not* get an interrupt
on the next consumed buffer, which is likely not what driver
intended when it called virtqueue_enable_cb.
Instead, how about we simply document the requirement that drivers either
always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
but not both?
> END_USE(vq);
> return last_used_idx;
> }
> --
> 1.7.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
2014-10-15 9:34 ` Michael S. Tsirkin
@ 2014-10-15 10:13 ` Jason Wang
2014-10-15 10:32 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:13 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
>> This patch checks the new event idx to make sure used event idx never
>> goes back. This is used to synchronize the calls between
>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> the implication being that moving event idx back might cause some race
> condition?
This will cause race condition when tx interrupt is enabled. Consider
the following cases
1) tx napi was scheduled
2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
event is vq->last_used_idx + 3/4 pendg bufs]
3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
vq->last_used_idx ]
After step 3, used event was moved back, unnecessary tx interrupt was
triggered.
> If yes but please describe the race explicitly.
> Is there a bug we need to fix on stable?
Looks not, current code does not have such race condition.
> Please also explicitly describe a configuration that causes event idx
> to go back.
>
> All this info should go in the commit log.
Will do this.
>> ---
>> drivers/virtio/virtio_ring.c | 7 +++++--
>> 1 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index 3b1f89b..1b3929f 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>> u16 last_used_idx;
>>
>> START_USE(vq);
>> -
>> + last_used_idx = vq->last_used_idx;
>> /* We optimistically turn back on interrupts, then check if there was
>> * more to do. */
>> /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>> * either clear the flags bit or point the event index at the next
>> * entry. Always do both to keep code simple. */
>> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>> - vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
>> + /* Make sure used event never go backwards */
> s/go/goes/
>
>> + if (!vring_need_event(vring_used_event(&vq->vring),
>> + vq->vring.avail->idx, last_used_idx))
>> + vring_used_event(&vq->vring) = last_used_idx;
> The result will be that driver will *not* get an interrupt
> on the next consumed buffer, which is likely not what driver
> intended when it called virtqueue_enable_cb.
This will only happen when we want to delay the interrupt for next few
consumed buffers (virtqueue_enable_cb_delayed() was called). For the
other case, vq->last_used_idx should be ahead of previous used event. Do
you see any other case?
>
> Instead, how about we simply document the requirement that drivers either
> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
> but not both?
We need call them both when tx interrupt is enabled I believe.
>
>> END_USE(vq);
>> return last_used_idx;
>> }
>> --
>> 1.7.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
2014-10-15 10:13 ` Jason Wang
@ 2014-10-15 10:32 ` Michael S. Tsirkin
2014-10-15 10:44 ` Jason Wang
0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:32 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
> >> This patch checks the new event idx to make sure used event idx never
> >> goes back. This is used to synchronize the calls between
> >> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
> >>
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > the implication being that moving event idx back might cause some race
> > condition?
>
> This will cause race condition when tx interrupt is enabled. Consider
> the following cases
>
> 1) tx napi was scheduled
> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
> event is vq->last_used_idx + 3/4 pendg bufs]
> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
> vq->last_used_idx ]
>
> After step 3, used event was moved back, unnecessary tx interrupt was
> triggered.
Well unnecessary interrupts are safe.
With your patch caller of virtqueue_enable_cb will not get an
interrupt on the next buffer which is not safe.
If you don't want an interrupt on the next buffer, don't
call virtqueue_enable_cb.
> > If yes but please describe the race explicitly.
> > Is there a bug we need to fix on stable?
>
> Looks not, current code does not have such race condition.
> > Please also explicitly describe a configuration that causes event idx
> > to go back.
> >
> > All this info should go in the commit log.
>
> Will do this.
> >> ---
> >> drivers/virtio/virtio_ring.c | 7 +++++--
> >> 1 files changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >> index 3b1f89b..1b3929f 100644
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >> u16 last_used_idx;
> >>
> >> START_USE(vq);
> >> -
> >> + last_used_idx = vq->last_used_idx;
> >> /* We optimistically turn back on interrupts, then check if there was
> >> * more to do. */
> >> /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> >> * either clear the flags bit or point the event index at the next
> >> * entry. Always do both to keep code simple. */
> >> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >> - vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
> >> + /* Make sure used event never go backwards */
> > s/go/goes/
> >
> >> + if (!vring_need_event(vring_used_event(&vq->vring),
> >> + vq->vring.avail->idx, last_used_idx))
> >> + vring_used_event(&vq->vring) = last_used_idx;
> > The result will be that driver will *not* get an interrupt
> > on the next consumed buffer, which is likely not what driver
> > intended when it called virtqueue_enable_cb.
>
> This will only happen when we want to delay the interrupt for next few
> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
> other case, vq->last_used_idx should be ahead of previous used event. Do
> you see any other case?
Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb. If
event index is not updated in virtqueue_enable_cb, driver will not get
an interrupt on the next buffer.
> >
> > Instead, how about we simply document the requirement that drivers either
> > always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
> > but not both?
>
> We need call them both when tx interrupt is enabled I believe.
Can you pls reply to my patch and document issues you see?
> >
> >> END_USE(vq);
> >> return last_used_idx;
> >> }
> >> --
> >> 1.7.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
2014-10-15 10:32 ` Michael S. Tsirkin
@ 2014-10-15 10:44 ` Jason Wang
2014-10-15 11:38 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:44 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
>> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
>>>> This patch checks the new event idx to make sure used event idx never
>>>> goes back. This is used to synchronize the calls between
>>>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
>>>>
>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> the implication being that moving event idx back might cause some race
>>> condition?
>> This will cause race condition when tx interrupt is enabled. Consider
>> the following cases
>>
>> 1) tx napi was scheduled
>> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
>> event is vq->last_used_idx + 3/4 pendg bufs]
>> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
>> vq->last_used_idx ]
>>
>> After step 3, used event was moved back, unnecessary tx interrupt was
>> triggered.
> Well unnecessary interrupts are safe.
But it that is what we want to reduce.
> With your patch caller of virtqueue_enable_cb will not get an
> interrupt on the next buffer which is not safe.
>
> If you don't want an interrupt on the next buffer, don't
> call virtqueue_enable_cb.
So something like this patch should be done in virtio core somewhere
else. Virtio-net can not do this since it does not have the knowledge of
event index.
>
>>> If yes but please describe the race explicitly.
>>> Is there a bug we need to fix on stable?
>> Looks not, current code does not have such race condition.
>>> Please also explicitly describe a configuration that causes event idx
>>> to go back.
>>>
>>> All this info should go in the commit log.
>> Will do this.
>>>> ---
>>>> drivers/virtio/virtio_ring.c | 7 +++++--
>>>> 1 files changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>> index 3b1f89b..1b3929f 100644
>>>> --- a/drivers/virtio/virtio_ring.c
>>>> +++ b/drivers/virtio/virtio_ring.c
>>>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>>> u16 last_used_idx;
>>>>
>>>> START_USE(vq);
>>>> -
>>>> + last_used_idx = vq->last_used_idx;
>>>> /* We optimistically turn back on interrupts, then check if there was
>>>> * more to do. */
>>>> /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>>>> * either clear the flags bit or point the event index at the next
>>>> * entry. Always do both to keep code simple. */
>>>> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>>> - vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
>>>> + /* Make sure used event never go backwards */
>>> s/go/goes/
>>>
>>>> + if (!vring_need_event(vring_used_event(&vq->vring),
>>>> + vq->vring.avail->idx, last_used_idx))
>>>> + vring_used_event(&vq->vring) = last_used_idx;
>>> The result will be that driver will *not* get an interrupt
>>> on the next consumed buffer, which is likely not what driver
>>> intended when it called virtqueue_enable_cb.
>> This will only happen when we want to delay the interrupt for next few
>> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
>> other case, vq->last_used_idx should be ahead of previous used event. Do
>> you see any other case?
> Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb. If
> event index is not updated in virtqueue_enable_cb, driver will not get
> an interrupt on the next buffer.
This is just what we want I think. The interrupt was not lost but fired
after 3/4 pending buffers were consumed. Do you see any real issue on this?
>
>>> Instead, how about we simply document the requirement that drivers either
>>> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
>>> but not both?
>> We need call them both when tx interrupt is enabled I believe.
> Can you pls reply to my patch and document issues you see?
>
In the previous reply you said you're using
virtuqueue_enable_cb_delayed(), so no race in your patch.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
2014-10-15 10:44 ` Jason Wang
@ 2014-10-15 11:38 ` Michael S. Tsirkin
2014-10-17 5:04 ` Jason Wang
0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 11:38 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 06:44:41PM +0800, Jason Wang wrote:
> On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
> >> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
> >>>> This patch checks the new event idx to make sure used event idx never
> >>>> goes back. This is used to synchronize the calls between
> >>>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
> >>>>
> >>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >>>> Cc: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>> the implication being that moving event idx back might cause some race
> >>> condition?
> >> This will cause race condition when tx interrupt is enabled. Consider
> >> the following cases
> >>
> >> 1) tx napi was scheduled
> >> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
> >> event is vq->last_used_idx + 3/4 pendg bufs]
> >> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
> >> vq->last_used_idx ]
> >>
> >> After step 3, used event was moved back, unnecessary tx interrupt was
> >> triggered.
> > Well unnecessary interrupts are safe.
>
> But it that is what we want to reduce.
It's all about correctness. I don't think mixing enable_cb
and enable_cb_delayed makes sense, let's just make
virtio behave correctly if that happens, no need to
optimize for that.
> > With your patch caller of virtqueue_enable_cb will not get an
> > interrupt on the next buffer which is not safe.
> >
> > If you don't want an interrupt on the next buffer, don't
> > call virtqueue_enable_cb.
>
> So something like this patch should be done in virtio core somewhere
> else. Virtio-net can not do this since it does not have the knowledge of
> event index.
Take a look at my patch - no calls to enable_cb, only
enable_cb_delayed, so we should be fine.
> >
> >>> If yes but please describe the race explicitly.
> >>> Is there a bug we need to fix on stable?
> >> Looks not, current code does not have such race condition.
> >>> Please also explicitly describe a configuration that causes event idx
> >>> to go back.
> >>>
> >>> All this info should go in the commit log.
> >> Will do this.
> >>>> ---
> >>>> drivers/virtio/virtio_ring.c | 7 +++++--
> >>>> 1 files changed, 5 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >>>> index 3b1f89b..1b3929f 100644
> >>>> --- a/drivers/virtio/virtio_ring.c
> >>>> +++ b/drivers/virtio/virtio_ring.c
> >>>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>>> u16 last_used_idx;
> >>>>
> >>>> START_USE(vq);
> >>>> -
> >>>> + last_used_idx = vq->last_used_idx;
> >>>> /* We optimistically turn back on interrupts, then check if there was
> >>>> * more to do. */
> >>>> /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> >>>> * either clear the flags bit or point the event index at the next
> >>>> * entry. Always do both to keep code simple. */
> >>>> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>> - vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
> >>>> + /* Make sure used event never go backwards */
> >>> s/go/goes/
> >>>
> >>>> + if (!vring_need_event(vring_used_event(&vq->vring),
> >>>> + vq->vring.avail->idx, last_used_idx))
> >>>> + vring_used_event(&vq->vring) = last_used_idx;
> >>> The result will be that driver will *not* get an interrupt
> >>> on the next consumed buffer, which is likely not what driver
> >>> intended when it called virtqueue_enable_cb.
> >> This will only happen when we want to delay the interrupt for next few
> >> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
> >> other case, vq->last_used_idx should be ahead of previous used event. Do
> >> you see any other case?
> > Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb. If
> > event index is not updated in virtqueue_enable_cb, driver will not get
> > an interrupt on the next buffer.
>
> This is just what we want I think. The interrupt was not lost but fired
> after 3/4 pending buffers were consumed. Do you see any real issue on this?
Yes, this violates the API. For example device might never
consume the rest of buffers.
> >
> >>> Instead, how about we simply document the requirement that drivers either
> >>> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
> >>> but not both?
> >> We need call them both when tx interrupt is enabled I believe.
> > Can you pls reply to my patch and document issues you see?
> >
>
> In the previous reply you said you're using
> virtuqueue_enable_cb_delayed(), so no race in your patch.
OK so you think my patch is also correct, but that yours gives better
efficiency?
--
MST
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
2014-10-15 11:38 ` Michael S. Tsirkin
@ 2014-10-17 5:04 ` Jason Wang
0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-17 5:04 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 07:38 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 06:44:41PM +0800, Jason Wang wrote:
>> On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
>>>> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
>>>>>> This patch checks the new event idx to make sure used event idx never
>>>>>> goes back. This is used to synchronize the calls between
>>>>>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
>>>>>>
>>>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> the implication being that moving event idx back might cause some race
>>>>> condition?
>>>> This will cause race condition when tx interrupt is enabled. Consider
>>>> the following cases
>>>>
>>>> 1) tx napi was scheduled
>>>> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
>>>> event is vq->last_used_idx + 3/4 pendg bufs]
>>>> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
>>>> vq->last_used_idx ]
>>>>
>>>> After step 3, used event was moved back, unnecessary tx interrupt was
>>>> triggered.
>>> Well unnecessary interrupts are safe.
>> But it that is what we want to reduce.
> It's all about correctness. I don't think mixing enable_cb
> and enable_cb_delayed makes sense, let's just make
> virtio behave correctly if that happens, no need to
> optimize for that.
Then as you said, need document or add WARN_ON() or BUG() in case both
of the two are used.
>
>
>>> With your patch caller of virtqueue_enable_cb will not get an
>>> interrupt on the next buffer which is not safe.
>>>
>>> If you don't want an interrupt on the next buffer, don't
>>> call virtqueue_enable_cb.
>> So something like this patch should be done in virtio core somewhere
>> else. Virtio-net can not do this since it does not have the knowledge of
>> event index.
> Take a look at my patch - no calls to enable_cb, only
> enable_cb_delayed, so we should be fine.
>
>>>>> If yes but please describe the race explicitly.
>>>>> Is there a bug we need to fix on stable?
>>>> Looks not, current code does not have such race condition.
>>>>> Please also explicitly describe a configuration that causes event idx
>>>>> to go back.
>>>>>
>>>>> All this info should go in the commit log.
>>>> Will do this.
>>>>>> ---
>>>>>> drivers/virtio/virtio_ring.c | 7 +++++--
>>>>>> 1 files changed, 5 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>>>> index 3b1f89b..1b3929f 100644
>>>>>> --- a/drivers/virtio/virtio_ring.c
>>>>>> +++ b/drivers/virtio/virtio_ring.c
>>>>>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>>>>> u16 last_used_idx;
>>>>>>
>>>>>> START_USE(vq);
>>>>>> -
>>>>>> + last_used_idx = vq->last_used_idx;
>>>>>> /* We optimistically turn back on interrupts, then check if there was
>>>>>> * more to do. */
>>>>>> /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>>>>>> * either clear the flags bit or point the event index at the next
>>>>>> * entry. Always do both to keep code simple. */
>>>>>> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>>>>> - vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
>>>>>> + /* Make sure used event never go backwards */
>>>>> s/go/goes/
>>>>>
>>>>>> + if (!vring_need_event(vring_used_event(&vq->vring),
>>>>>> + vq->vring.avail->idx, last_used_idx))
>>>>>> + vring_used_event(&vq->vring) = last_used_idx;
>>>>> The result will be that driver will *not* get an interrupt
>>>>> on the next consumed buffer, which is likely not what driver
>>>>> intended when it called virtqueue_enable_cb.
>>>> This will only happen when we want to delay the interrupt for next few
>>>> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
>>>> other case, vq->last_used_idx should be ahead of previous used event. Do
>>>> you see any other case?
>>> Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb. If
>>> event index is not updated in virtqueue_enable_cb, driver will not get
>>> an interrupt on the next buffer.
>> This is just what we want I think. The interrupt was not lost but fired
>> after 3/4 pending buffers were consumed. Do you see any real issue on this?
> Yes, this violates the API. For example device might never
> consume the rest of buffers.
Then it should be a bug of device which is out of the control of guest.
If not, device might never also consume 3/4 rest of buffers.
>
>>>>> Instead, how about we simply document the requirement that drivers either
>>>>> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
>>>>> but not both?
>>>> We need call them both when tx interrupt is enabled I believe.
>>> Can you pls reply to my patch and document issues you see?
>>>
>> In the previous reply you said you're using
>> virtuqueue_enable_cb_delayed(), so no race in your patch.
> OK so you think my patch is also correct, but that yours gives better
> efficiency?
>
Need some benchmark to see the difference I think.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
2014-10-15 7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
2014-10-15 7:25 ` [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards Jason Wang
@ 2014-10-15 7:25 ` Jason Wang
2014-10-15 9:28 ` Michael S. Tsirkin
2014-10-15 7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
` (4 subsequent siblings)
6 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 7:25 UTC (permalink / raw)
To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet
This patch introduces virtio_enable_cb_avail() to publish avail idx
and used event. This could be used by batched buffer submitting to
reduce the number of tx interrupts.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++--
include/linux/virtio.h | 2 ++
2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1b3929f..d67fbf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
* entry. Always do both to keep code simple. */
vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
/* Make sure used event never go backwards */
- if (!vring_need_event(vring_used_event(&vq->vring),
- vq->vring.avail->idx, last_used_idx))
+ if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
+ !vring_need_event(vring_used_event(&vq->vring),
+ vq->vring.avail->idx, last_used_idx)) {
vring_used_event(&vq->vring) = last_used_idx;
+ }
END_USE(vq);
return last_used_idx;
}
EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
+bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ bool ret;
+
+ START_USE(vq);
+ vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
+ vring_used_event(&vq->vring) = vq->vring.avail->idx;
+ ret = vring_need_event(vq->vring.avail->idx,
+ vq->last_used_idx, vq->vring.used->idx);
+ END_USE(vq);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
+
/**
* virtqueue_poll - query pending used buffers
* @vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b46671e..bfaf058 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
+bool virtqueue_enable_cb_avail(struct virtqueue *vq);
+
bool virtqueue_poll(struct virtqueue *vq, unsigned);
bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
--
1.7.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
2014-10-15 7:25 ` [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail() Jason Wang
@ 2014-10-15 9:28 ` Michael S. Tsirkin
2014-10-15 10:19 ` Jason Wang
0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 9:28 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> This patch introduces virtio_enable_cb_avail() to publish avail idx
> and used event. This could be used by batched buffer submitting to
> reduce the number of tx interrupts.
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++--
> include/linux/virtio.h | 2 ++
> 2 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 1b3929f..d67fbf8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> * entry. Always do both to keep code simple. */
> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> /* Make sure used event never go backwards */
> - if (!vring_need_event(vring_used_event(&vq->vring),
> - vq->vring.avail->idx, last_used_idx))
> + if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> + !vring_need_event(vring_used_event(&vq->vring),
> + vq->vring.avail->idx, last_used_idx)) {
> vring_used_event(&vq->vring) = last_used_idx;
> + }
> END_USE(vq);
> return last_used_idx;
> }
> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>
I see you are also changing virtqueue_enable_cb_prepare, why?
> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + bool ret;
> +
> + START_USE(vq);
> + vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> + vring_used_event(&vq->vring) = vq->vring.avail->idx;
> + ret = vring_need_event(vq->vring.avail->idx,
> + vq->last_used_idx, vq->vring.used->idx);
> + END_USE(vq);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> +
> /**
> * virtqueue_poll - query pending used buffers
> * @vq: the struct virtqueue we're talking about.
Could not figure out what this does.
Please add documentation.
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index b46671e..bfaf058 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>
> unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>
> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> +
> bool virtqueue_poll(struct virtqueue *vq, unsigned);
>
> bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
2014-10-15 9:28 ` Michael S. Tsirkin
@ 2014-10-15 10:19 ` Jason Wang
2014-10-15 10:41 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:19 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
>> This patch introduces virtio_enable_cb_avail() to publish avail idx
>> and used event. This could be used by batched buffer submitting to
>> reduce the number of tx interrupts.
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++--
>> include/linux/virtio.h | 2 ++
>> 2 files changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index 1b3929f..d67fbf8 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>> * entry. Always do both to keep code simple. */
>> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>> /* Make sure used event never go backwards */
>> - if (!vring_need_event(vring_used_event(&vq->vring),
>> - vq->vring.avail->idx, last_used_idx))
>> + if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
>> + !vring_need_event(vring_used_event(&vq->vring),
>> + vq->vring.avail->idx, last_used_idx)) {
>> vring_used_event(&vq->vring) = last_used_idx;
>> + }
>> END_USE(vq);
>> return last_used_idx;
>> }
>> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>>
> I see you are also changing virtqueue_enable_cb_prepare, why?
This is also used to prevent it from moving the used event backwards.
This may happens when we handle tx napi after we publish avail idx as
used event (virtqueue_enable_cb_avail() was called).
>
>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
>> +{
>> + struct vring_virtqueue *vq = to_vvq(_vq);
>> + bool ret;
>> +
>> + START_USE(vq);
>> + vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>> + vring_used_event(&vq->vring) = vq->vring.avail->idx;
>> + ret = vring_need_event(vq->vring.avail->idx,
>> + vq->last_used_idx, vq->vring.used->idx);
>> + END_USE(vq);
>> +
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
>> +
>> /**
>> * virtqueue_poll - query pending used buffers
>> * @vq: the struct virtqueue we're talking about.
> Could not figure out what this does.
> Please add documentation.
>
Sure, does something like below explain what does this function do?
/**
* virtqueue_enable_cb_avail - restart callbacks after
disable_cb.
* @vq: the struct virtqueue we're talking
about.
*
* This re-enables callbacks but hints to the other side to
delay
* interrupts all of the available buffers have been processed;
* it returns "false" if there are at least one pending buffer in the
queue,
* to detect a possible race between the driver checking for more
work,
* and enabling
callbacks.
*
* Caller must ensure we don't call this with other
virtqueue
* operations at the same time (except where
noted).
*/
>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
>> index b46671e..bfaf058 100644
>> --- a/include/linux/virtio.h
>> +++ b/include/linux/virtio.h
>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>>
>> unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>>
>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
>> +
>> bool virtqueue_poll(struct virtqueue *vq, unsigned);
>>
>> bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>> --
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
2014-10-15 10:19 ` Jason Wang
@ 2014-10-15 10:41 ` Michael S. Tsirkin
2014-10-15 10:58 ` Jason Wang
0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:41 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
> On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> >> This patch introduces virtio_enable_cb_avail() to publish avail idx
> >> and used event. This could be used by batched buffer submitting to
> >> reduce the number of tx interrupts.
> >>
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >> drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++--
> >> include/linux/virtio.h | 2 ++
> >> 2 files changed, 22 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >> index 1b3929f..d67fbf8 100644
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >> * entry. Always do both to keep code simple. */
> >> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >> /* Make sure used event never go backwards */
> >> - if (!vring_need_event(vring_used_event(&vq->vring),
> >> - vq->vring.avail->idx, last_used_idx))
> >> + if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> >> + !vring_need_event(vring_used_event(&vq->vring),
> >> + vq->vring.avail->idx, last_used_idx)) {
> >> vring_used_event(&vq->vring) = last_used_idx;
> >> + }
> >> END_USE(vq);
> >> return last_used_idx;
> >> }
> >> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
> >>
> > I see you are also changing virtqueue_enable_cb_prepare, why?
>
> This is also used to prevent it from moving the used event backwards.
> This may happens when we handle tx napi after we publish avail idx as
> used event (virtqueue_enable_cb_avail() was called).
So it's wrong exactly in the same way.
But also, please document this stuff, don't put
unrelated changes in a patch called "introduce
virtqueue_enable_cb_avail".
> >
> >> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> >> +{
> >> + struct vring_virtqueue *vq = to_vvq(_vq);
> >> + bool ret;
> >> +
> >> + START_USE(vq);
> >> + vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >> + vring_used_event(&vq->vring) = vq->vring.avail->idx;
> >> + ret = vring_need_event(vq->vring.avail->idx,
> >> + vq->last_used_idx, vq->vring.used->idx);
> >> + END_USE(vq);
> >> +
> >> + return ret;
> >> +}
> >> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> >> +
> >> /**
> >> * virtqueue_poll - query pending used buffers
> >> * @vq: the struct virtqueue we're talking about.
> > Could not figure out what this does.
> > Please add documentation.
> >
>
> Sure, does something like below explain what does this function do?
>
> /**
>
> * virtqueue_enable_cb_avail - restart callbacks after
> disable_cb.
> * @vq: the struct virtqueue we're talking
> about.
> *
>
> * This re-enables callbacks but hints to the other side to
> delay
> * interrupts all of the available buffers have been processed;
So this is like virtqueue_enable_cb_delayed but even more
aggressive?
I think it's too agressive: it's better to wake up guest
after you are through most of buffers, but not all,
so guest and host can work in parallel.
> * it returns "false" if there are at least one pending buffer in the
> queue,
> * to detect a possible race between the driver checking for more
> work,
> * and enabling
> callbacks.
> *
>
> * Caller must ensure we don't call this with other
> virtqueue
> * operations at the same time (except where
> noted).
> */
>
> >> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> >> index b46671e..bfaf058 100644
> >> --- a/include/linux/virtio.h
> >> +++ b/include/linux/virtio.h
> >> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >>
> >> unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >>
> >> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> >> +
> >> bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >>
> >> bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> >> --
> >> 1.7.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
2014-10-15 10:41 ` Michael S. Tsirkin
@ 2014-10-15 10:58 ` Jason Wang
2014-10-15 11:43 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:58 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
>> On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
>>>> This patch introduces virtio_enable_cb_avail() to publish avail idx
>>>> and used event. This could be used by batched buffer submitting to
>>>> reduce the number of tx interrupts.
>>>>
>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>> drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++--
>>>> include/linux/virtio.h | 2 ++
>>>> 2 files changed, 22 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>> index 1b3929f..d67fbf8 100644
>>>> --- a/drivers/virtio/virtio_ring.c
>>>> +++ b/drivers/virtio/virtio_ring.c
>>>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>>> * entry. Always do both to keep code simple. */
>>>> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>>> /* Make sure used event never go backwards */
>>>> - if (!vring_need_event(vring_used_event(&vq->vring),
>>>> - vq->vring.avail->idx, last_used_idx))
>>>> + if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
>>>> + !vring_need_event(vring_used_event(&vq->vring),
>>>> + vq->vring.avail->idx, last_used_idx)) {
>>>> vring_used_event(&vq->vring) = last_used_idx;
>>>> + }
>>>> END_USE(vq);
>>>> return last_used_idx;
>>>> }
>>>> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>>>>
>>> I see you are also changing virtqueue_enable_cb_prepare, why?
>> This is also used to prevent it from moving the used event backwards.
>> This may happens when we handle tx napi after we publish avail idx as
>> used event (virtqueue_enable_cb_avail() was called).
> So it's wrong exactly in the same way.
>
> But also, please document this stuff, don't put
> unrelated changes in a patch called "introduce
> virtqueue_enable_cb_avail".
>
>
>>>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
>>>> +{
>>>> + struct vring_virtqueue *vq = to_vvq(_vq);
>>>> + bool ret;
>>>> +
>>>> + START_USE(vq);
>>>> + vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>>> + vring_used_event(&vq->vring) = vq->vring.avail->idx;
>>>> + ret = vring_need_event(vq->vring.avail->idx,
>>>> + vq->last_used_idx, vq->vring.used->idx);
>>>> + END_USE(vq);
>>>> +
>>>> + return ret;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
>>>> +
>>>> /**
>>>> * virtqueue_poll - query pending used buffers
>>>> * @vq: the struct virtqueue we're talking about.
>>> Could not figure out what this does.
>>> Please add documentation.
>>>
>> Sure, does something like below explain what does this function do?
>>
>> /**
>>
>> * virtqueue_enable_cb_avail - restart callbacks after
>> disable_cb.
>> * @vq: the struct virtqueue we're talking
>> about.
>> *
>>
>> * This re-enables callbacks but hints to the other side to
>> delay
>> * interrupts all of the available buffers have been processed;
>
> So this is like virtqueue_enable_cb_delayed but even more
> aggressive?
> I think it's too agressive: it's better to wake up guest
> after you are through most of buffers, but not all,
> so guest and host can work in parallel.
Note that:
- it was only used when there are still few of free slots (which is
greater than 2 + MAX_SKB_FRAGS)
- my patch keeps the free_old_xmit_skbs() in the beginning of
start_xmit(), so the tx skb reclaiming does not depends totally on tx
interrupt. If more packets comes, we'd expect some of them were freed in
ndo_start_xmit(). If not, finally we may trigger the
virtqueue_enable_cb_delayed().
So probably not as aggressive as it looks. I will do benchmark on this.
>
>
>> * it returns "false" if there are at least one pending buffer in the
>> queue,
>> * to detect a possible race between the driver checking for more
>> work,
>> * and enabling
>> callbacks.
>> *
>>
>> * Caller must ensure we don't call this with other
>> virtqueue
>> * operations at the same time (except where
>> noted).
>> */
>>
>>>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
>>>> index b46671e..bfaf058 100644
>>>> --- a/include/linux/virtio.h
>>>> +++ b/include/linux/virtio.h
>>>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>>>>
>>>> unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>>>>
>>>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
>>>> +
>>>> bool virtqueue_poll(struct virtqueue *vq, unsigned);
>>>>
>>>> bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>>>> --
>>>> 1.7.1
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
2014-10-15 10:58 ` Jason Wang
@ 2014-10-15 11:43 ` Michael S. Tsirkin
0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 11:43 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 06:58:15PM +0800, Jason Wang wrote:
> On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
> >> On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> >>>> This patch introduces virtio_enable_cb_avail() to publish avail idx
> >>>> and used event. This could be used by batched buffer submitting to
> >>>> reduce the number of tx interrupts.
> >>>>
> >>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >>>> Cc: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>> ---
> >>>> drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++--
> >>>> include/linux/virtio.h | 2 ++
> >>>> 2 files changed, 22 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >>>> index 1b3929f..d67fbf8 100644
> >>>> --- a/drivers/virtio/virtio_ring.c
> >>>> +++ b/drivers/virtio/virtio_ring.c
> >>>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>>> * entry. Always do both to keep code simple. */
> >>>> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>> /* Make sure used event never go backwards */
> >>>> - if (!vring_need_event(vring_used_event(&vq->vring),
> >>>> - vq->vring.avail->idx, last_used_idx))
> >>>> + if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> >>>> + !vring_need_event(vring_used_event(&vq->vring),
> >>>> + vq->vring.avail->idx, last_used_idx)) {
> >>>> vring_used_event(&vq->vring) = last_used_idx;
> >>>> + }
> >>>> END_USE(vq);
> >>>> return last_used_idx;
> >>>> }
> >>>> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
> >>>>
> >>> I see you are also changing virtqueue_enable_cb_prepare, why?
> >> This is also used to prevent it from moving the used event backwards.
> >> This may happens when we handle tx napi after we publish avail idx as
> >> used event (virtqueue_enable_cb_avail() was called).
> > So it's wrong exactly in the same way.
> >
> > But also, please document this stuff, don't put
> > unrelated changes in a patch called "introduce
> > virtqueue_enable_cb_avail".
> >
> >
> >>>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> >>>> +{
> >>>> + struct vring_virtqueue *vq = to_vvq(_vq);
> >>>> + bool ret;
> >>>> +
> >>>> + START_USE(vq);
> >>>> + vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>> + vring_used_event(&vq->vring) = vq->vring.avail->idx;
> >>>> + ret = vring_need_event(vq->vring.avail->idx,
> >>>> + vq->last_used_idx, vq->vring.used->idx);
> >>>> + END_USE(vq);
> >>>> +
> >>>> + return ret;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> >>>> +
> >>>> /**
> >>>> * virtqueue_poll - query pending used buffers
> >>>> * @vq: the struct virtqueue we're talking about.
> >>> Could not figure out what this does.
> >>> Please add documentation.
> >>>
> >> Sure, does something like below explain what does this function do?
> >>
> >> /**
> >>
> >> * virtqueue_enable_cb_avail - restart callbacks after
> >> disable_cb.
> >> * @vq: the struct virtqueue we're talking
> >> about.
> >> *
> >>
> >> * This re-enables callbacks but hints to the other side to
> >> delay
> >> * interrupts all of the available buffers have been processed;
> >
> > So this is like virtqueue_enable_cb_delayed but even more
> > aggressive?
> > I think it's too agressive: it's better to wake up guest
> > after you are through most of buffers, but not all,
> > so guest and host can work in parallel.
>
> Note that:
>
> - it was only used when there are still few of free slots (which is
> greater than 2 + MAX_SKB_FRAGS)
> - my patch keeps the free_old_xmit_skbs() in the beginning of
> start_xmit(), so the tx skb reclaiming does not depends totally on tx
> interrupt. If more packets comes, we'd expect some of them were freed in
> ndo_start_xmit(). If not, finally we may trigger the
> virtqueue_enable_cb_delayed().
>
> So probably not as aggressive as it looks. I will do benchmark on this.
Mine too:
} else if (virtqueue_enable_cb_delayed(sq->vq)) {
free_old_xmit_skbs(txq, sq, qsize);
}
> >
> >
> >> * it returns "false" if there are at least one pending buffer in the
> >> queue,
> >> * to detect a possible race between the driver checking for more
> >> work,
> >> * and enabling
> >> callbacks.
> >> *
> >>
> >> * Caller must ensure we don't call this with other
> >> virtqueue
> >> * operations at the same time (except where
> >> noted).
> >> */
> >>
> >>>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> >>>> index b46671e..bfaf058 100644
> >>>> --- a/include/linux/virtio.h
> >>>> +++ b/include/linux/virtio.h
> >>>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >>>>
> >>>> unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >>>>
> >>>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> >>>> +
> >>>> bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >>>>
> >>>> bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> >>>> --
> >>>> 1.7.1
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>> Please read the FAQ at http://www.tux.org/lkml/
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
2014-10-15 7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
2014-10-15 7:25 ` [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards Jason Wang
2014-10-15 7:25 ` [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail() Jason Wang
@ 2014-10-15 7:25 ` Jason Wang
2014-10-15 9:36 ` Eric Dumazet
2014-10-15 9:37 ` Michael S. Tsirkin
2014-10-15 7:25 ` [RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs() Jason Wang
` (3 subsequent siblings)
6 siblings, 2 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15 7:25 UTC (permalink / raw)
To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet
Accumulate the sent packets and sent bytes in local variables and perform a
single u64_stats_update_begin/end() after.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..a4d56b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
unsigned int len;
struct virtnet_info *vi = sq->vq->vdev->priv;
struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+ u64 tx_bytes = 0, tx_packets = 0;
while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
pr_debug("Sent skb %p\n", skb);
- u64_stats_update_begin(&stats->tx_syncp);
- stats->tx_bytes += skb->len;
- stats->tx_packets++;
- u64_stats_update_end(&stats->tx_syncp);
+ tx_bytes += skb->len;
+ tx_packets++;
dev_kfree_skb_any(skb);
}
+
+ u64_stats_update_begin(&stats->tx_syncp);
+ stats->tx_bytes += tx_bytes;
+ stats->tx_packets =+ tx_packets;
+ u64_stats_update_end(&stats->tx_syncp);
}
static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
--
1.7.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
2014-10-15 7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
@ 2014-10-15 9:36 ` Eric Dumazet
2014-10-15 9:37 ` Michael S. Tsirkin
1 sibling, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2014-10-15 9:36 UTC (permalink / raw)
To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, davem
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:
> Accumulate the sent packets and sent bytes in local variables and perform a
> single u64_stats_update_begin/end() after.
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/virtio_net.c | 12 ++++++++----
> 1 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0ce44..a4d56b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> unsigned int len;
> struct virtnet_info *vi = sq->vq->vdev->priv;
> struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> + u64 tx_bytes = 0, tx_packets = 0;
>
> while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> pr_debug("Sent skb %p\n", skb);
>
> - u64_stats_update_begin(&stats->tx_syncp);
> - stats->tx_bytes += skb->len;
> - stats->tx_packets++;
> - u64_stats_update_end(&stats->tx_syncp);
> + tx_bytes += skb->len;
> + tx_packets++;
>
> dev_kfree_skb_any(skb);
> }
> +
> + u64_stats_update_begin(&stats->tx_syncp);
> + stats->tx_bytes += tx_bytes;
> + stats->tx_packets =+ tx_packets;
stats->tx_packets += tx_packets;
> + u64_stats_update_end(&stats->tx_syncp);
> }
>
> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
2014-10-15 7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
2014-10-15 9:36 ` Eric Dumazet
@ 2014-10-15 9:37 ` Michael S. Tsirkin
2014-10-15 9:49 ` David Laight
1 sibling, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 9:37 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> Accumulate the sent packets and sent bytes in local variables and perform a
> single u64_stats_update_begin/end() after.
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Not sure how much it's worth but since Eric suggested it ...
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/virtio_net.c | 12 ++++++++----
> 1 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0ce44..a4d56b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> unsigned int len;
> struct virtnet_info *vi = sq->vq->vdev->priv;
> struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> + u64 tx_bytes = 0, tx_packets = 0;
>
> while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> pr_debug("Sent skb %p\n", skb);
>
> - u64_stats_update_begin(&stats->tx_syncp);
> - stats->tx_bytes += skb->len;
> - stats->tx_packets++;
> - u64_stats_update_end(&stats->tx_syncp);
> + tx_bytes += skb->len;
> + tx_packets++;
>
> dev_kfree_skb_any(skb);
> }
> +
> + u64_stats_update_begin(&stats->tx_syncp);
> + stats->tx_bytes += tx_bytes;
> + stats->tx_packets =+ tx_packets;
> + u64_stats_update_end(&stats->tx_syncp);
> }
>
> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> --
> 1.7.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
2014-10-15 9:37 ` Michael S. Tsirkin
@ 2014-10-15 9:49 ` David Laight
2014-10-15 10:48 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: David Laight @ 2014-10-15 9:49 UTC (permalink / raw)
To: 'Michael S. Tsirkin', Jason Wang
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
From: Of Michael S. Tsirkin
> On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > Accumulate the sent packets and sent bytes in local variables and perform a
> > single u64_stats_update_begin/end() after.
> >
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
> Not sure how much it's worth but since Eric suggested it ...
Probably depends on the actual cost of u64_stats_update_begin/end
against the likely extra saving of the tx_bytes and tx_packets
values onto the stack across the call to dev_kfree_skb_any().
(Which depends on the number of caller saved registers.)
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
> > ---
> > drivers/net/virtio_net.c | 12 ++++++++----
> > 1 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 3d0ce44..a4d56b8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > unsigned int len;
> > struct virtnet_info *vi = sq->vq->vdev->priv;
> > struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > + u64 tx_bytes = 0, tx_packets = 0;
tx_packets need only be 'unsigned int'.
The same is almost certainly true of tx_bytes.
David
> > while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > pr_debug("Sent skb %p\n", skb);
> >
> > - u64_stats_update_begin(&stats->tx_syncp);
> > - stats->tx_bytes += skb->len;
> > - stats->tx_packets++;
> > - u64_stats_update_end(&stats->tx_syncp);
> > + tx_bytes += skb->len;
> > + tx_packets++;
> >
> > dev_kfree_skb_any(skb);
> > }
> > +
> > + u64_stats_update_begin(&stats->tx_syncp);
> > + stats->tx_bytes += tx_bytes;
> > + stats->tx_packets =+ tx_packets;
> > + u64_stats_update_end(&stats->tx_syncp);
> > }
> >
> > static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > --
> > 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
2014-10-15 9:49 ` David Laight
@ 2014-10-15 10:48 ` Michael S. Tsirkin
2014-10-15 10:51 ` David Laight
0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:48 UTC (permalink / raw)
To: David Laight; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 09:49:01AM +0000, David Laight wrote:
> From: Of Michael S. Tsirkin
> > On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > > Accumulate the sent packets and sent bytes in local variables and perform a
> > > single u64_stats_update_begin/end() after.
> > >
> > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >
> > Not sure how much it's worth but since Eric suggested it ...
>
> Probably depends on the actual cost of u64_stats_update_begin/end
> against the likely extra saving of the tx_bytes and tx_packets
> values onto the stack across the call to dev_kfree_skb_any().
> (Which depends on the number of caller saved registers.)
Yea, some benchmark results would be nice to see.
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > > ---
> > > drivers/net/virtio_net.c | 12 ++++++++----
> > > 1 files changed, 8 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 3d0ce44..a4d56b8 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > unsigned int len;
> > > struct virtnet_info *vi = sq->vq->vdev->priv;
> > > struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > > + u64 tx_bytes = 0, tx_packets = 0;
>
> tx_packets need only be 'unsigned int'.
> The same is almost certainly true of tx_bytes.
>
> David
>
> > > while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > pr_debug("Sent skb %p\n", skb);
> > >
> > > - u64_stats_update_begin(&stats->tx_syncp);
> > > - stats->tx_bytes += skb->len;
> > > - stats->tx_packets++;
> > > - u64_stats_update_end(&stats->tx_syncp);
> > > + tx_bytes += skb->len;
> > > + tx_packets++;
> > >
> > > dev_kfree_skb_any(skb);
> > > }
> > > +
> > > + u64_stats_update_begin(&stats->tx_syncp);
> > > + stats->tx_bytes += tx_bytes;
> > > + stats->tx_packets =+ tx_packets;
> > > + u64_stats_update_end(&stats->tx_syncp);
> > > }
> > >
> > > static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > > --
> > > 1.7.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
2014-10-15 10:48 ` Michael S. Tsirkin
@ 2014-10-15 10:51 ` David Laight
2014-10-15 12:00 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: David Laight @ 2014-10-15 10:51 UTC (permalink / raw)
To: 'Michael S. Tsirkin'
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
From: Michael S. Tsirkin
> On Wed, Oct 15, 2014 at 09:49:01AM +0000, David Laight wrote:
> > From: Of Michael S. Tsirkin
> > > On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > > > Accumulate the sent packets and sent bytes in local variables and perform a
> > > > single u64_stats_update_begin/end() after.
> > > >
> > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > > Not sure how much it's worth but since Eric suggested it ...
> >
> > Probably depends on the actual cost of u64_stats_update_begin/end
> > against the likely extra saving of the tx_bytes and tx_packets
> > values onto the stack across the call to dev_kfree_skb_any().
> > (Which depends on the number of caller saved registers.)
>
> Yea, some benchmark results would be nice to see.
I there are likely to be multiple skb on the queue the fastest
code would probably do one 'virtqueue_get_all()' that returned
a linked list of buffers, then follow the list to get the stats,
and follow it again to free the skb.
David
> > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > >
> > > > ---
> > > > drivers/net/virtio_net.c | 12 ++++++++----
> > > > 1 files changed, 8 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 3d0ce44..a4d56b8 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > > unsigned int len;
> > > > struct virtnet_info *vi = sq->vq->vdev->priv;
> > > > struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > > > + u64 tx_bytes = 0, tx_packets = 0;
> >
> > tx_packets need only be 'unsigned int'.
> > The same is almost certainly true of tx_bytes.
> >
> > David
> >
> > > > while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > pr_debug("Sent skb %p\n", skb);
> > > >
> > > > - u64_stats_update_begin(&stats->tx_syncp);
> > > > - stats->tx_bytes += skb->len;
> > > > - stats->tx_packets++;
> > > > - u64_stats_update_end(&stats->tx_syncp);
> > > > + tx_bytes += skb->len;
> > > > + tx_packets++;
> > > >
> > > > dev_kfree_skb_any(skb);
> > > > }
> > > > +
> > > > + u64_stats_update_begin(&stats->tx_syncp);
> > > > + stats->tx_bytes += tx_bytes;
> > > > + stats->tx_packets =+ tx_packets;
> > > > + u64_stats_update_end(&stats->tx_syncp);
> > > > }
> > > >
> > > > static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > > > --
> > > > 1.7.1
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
2014-10-15 10:51 ` David Laight
@ 2014-10-15 12:00 ` Michael S. Tsirkin
0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 12:00 UTC (permalink / raw)
To: David Laight; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 10:51:32AM +0000, David Laight wrote:
> From: Michael S. Tsirkin
> > On Wed, Oct 15, 2014 at 09:49:01AM +0000, David Laight wrote:
> > > From: Of Michael S. Tsirkin
> > > > On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > > > > Accumulate the sent packets and sent bytes in local variables and perform a
> > > > > single u64_stats_update_begin/end() after.
> > > > >
> > > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > > Not sure how much it's worth but since Eric suggested it ...
> > >
> > > Probably depends on the actual cost of u64_stats_update_begin/end
> > > against the likely extra saving of the tx_bytes and tx_packets
> > > values onto the stack across the call to dev_kfree_skb_any().
> > > (Which depends on the number of caller saved registers.)
> >
> > Yea, some benchmark results would be nice to see.
>
> I there are likely to be multiple skb on the queue the fastest
> code would probably do one 'virtqueue_get_all()' that returned
> a linked list of buffers, then follow the list to get the stats,
> and follow it again to free the skb.
>
> David
Interesting. Each time we tried playing with linked list in the past,
it simply destroyed performance.
Maybe this case is different but I have my doubts.
> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > >
> > > > > ---
> > > > > drivers/net/virtio_net.c | 12 ++++++++----
> > > > > 1 files changed, 8 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 3d0ce44..a4d56b8 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > > > unsigned int len;
> > > > > struct virtnet_info *vi = sq->vq->vdev->priv;
> > > > > struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > > > > + u64 tx_bytes = 0, tx_packets = 0;
> > >
> > > tx_packets need only be 'unsigned int'.
> > > The same is almost certainly true of tx_bytes.
> > >
> > > David
> > >
> > > > > while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > > pr_debug("Sent skb %p\n", skb);
> > > > >
> > > > > - u64_stats_update_begin(&stats->tx_syncp);
> > > > > - stats->tx_bytes += skb->len;
> > > > > - stats->tx_packets++;
> > > > > - u64_stats_update_end(&stats->tx_syncp);
> > > > > + tx_bytes += skb->len;
> > > > > + tx_packets++;
> > > > >
> > > > > dev_kfree_skb_any(skb);
> > > > > }
> > > > > +
> > > > > + u64_stats_update_begin(&stats->tx_syncp);
> > > > > + stats->tx_bytes += tx_bytes;
> > > > > + stats->tx_packets =+ tx_packets;
> > > > > + u64_stats_update_end(&stats->tx_syncp);
> > > > > }
> > > > >
> > > > > static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > > > > --
> > > > > 1.7.1
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs()
2014-10-15 7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
` (2 preceding siblings ...)
2014-10-15 7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
@ 2014-10-15 7:25 ` Jason Wang
2014-10-15 7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
` (2 subsequent siblings)
6 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15 7:25 UTC (permalink / raw)
To: rusty, mst, virtualization, linux-kernel, netdev
Cc: davem, eric.dumazet, Jason Wang
This patch returns the number of packets sent in free_old_xmit_skbs(), this is
necessary for calling this function in napi.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a4d56b8..ccf98f9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -827,7 +827,7 @@ static int virtnet_open(struct net_device *dev)
return 0;
}
-static void free_old_xmit_skbs(struct send_queue *sq)
+static int free_old_xmit_skbs(struct send_queue *sq)
{
struct sk_buff *skb;
unsigned int len;
@@ -848,6 +848,8 @@ static void free_old_xmit_skbs(struct send_queue *sq)
stats->tx_bytes += tx_bytes;
stats->tx_packets =+ tx_packets;
u64_stats_update_end(&stats->tx_syncp);
+
+ return tx_packets;
}
static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
--
1.7.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
2014-10-15 7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
` (3 preceding siblings ...)
2014-10-15 7:25 ` [RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs() Jason Wang
@ 2014-10-15 7:25 ` Jason Wang
2014-10-15 9:37 ` Eric Dumazet
2014-10-15 10:18 ` Michael S. Tsirkin
2014-10-15 7:25 ` [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain Jason Wang
2014-10-15 10:25 ` [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Michael S. Tsirkin
6 siblings, 2 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15 7:25 UTC (permalink / raw)
To: rusty, mst, virtualization, linux-kernel, netdev
Cc: davem, eric.dumazet, Jason Wang
Orphan skb in ndo_start_xmit() breaks socket accounting and packet
queuing. This in fact breaks lots of things such as pktgen and several
TCP optimizations. And also make BQL can't be implemented for
virtio-net.
This patch tries to solve this issue by enabling tx interrupt. To
avoid introducing extra spinlocks, a tx napi was scheduled to free
those packets.
More tx interrupt mitigation method could be used on top.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 125 +++++++++++++++++++++++++++++++---------------
1 files changed, 85 insertions(+), 40 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ccf98f9..2afc2e2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
/* Name of the send queue: output.$index */
char name[40];
+
+ struct napi_struct napi;
};
/* Internal representation of a receive virtqueue */
@@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
return p;
}
+static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+ struct sk_buff *skb;
+ unsigned int len;
+ struct virtnet_info *vi = sq->vq->vdev->priv;
+ struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+ u64 tx_bytes = 0, tx_packets = 0;
+
+ while (tx_packets < budget &&
+ (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+ pr_debug("Sent skb %p\n", skb);
+
+ tx_bytes += skb->len;
+ tx_packets++;
+
+ dev_kfree_skb_any(skb);
+ }
+
+ u64_stats_update_begin(&stats->tx_syncp);
+ stats->tx_bytes += tx_bytes;
+ stats->tx_packets =+ tx_packets;
+ u64_stats_update_end(&stats->tx_syncp);
+
+ return tx_packets;
+}
+
static void skb_xmit_done(struct virtqueue *vq)
{
struct virtnet_info *vi = vq->vdev->priv;
+ struct send_queue *sq = &vi->sq[vq2txq(vq)];
- /* Suppress further interrupts. */
- virtqueue_disable_cb(vq);
-
- /* We were probably waiting for more output buffers. */
- netif_wake_subqueue(vi->dev, vq2txq(vq));
+ if (napi_schedule_prep(&sq->napi)) {
+ __napi_schedule(&sq->napi);
+ }
}
static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -774,7 +801,39 @@ again:
return received;
}
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+ struct send_queue *sq =
+ container_of(napi, struct send_queue, napi);
+ struct virtnet_info *vi = sq->vq->vdev->priv;
+ struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+ unsigned int r, sent = 0;
+
+again:
+ __netif_tx_lock(txq, smp_processor_id());
+ virtqueue_disable_cb(sq->vq);
+ sent += free_old_xmit_skbs(sq, budget - sent);
+
+ if (sent < budget) {
+ r = virtqueue_enable_cb_prepare(sq->vq);
+ napi_complete(napi);
+ __netif_tx_unlock(txq);
+ if (unlikely(virtqueue_poll(sq->vq, r)) &&
+ napi_schedule_prep(napi)) {
+ virtqueue_disable_cb(sq->vq);
+ __napi_schedule(napi);
+ goto again;
+ }
+ } else {
+ __netif_tx_unlock(txq);
+ }
+
+ netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+ return sent;
+}
+
#ifdef CONFIG_NET_RX_BUSY_POLL
+
/* must be called with local_bh_disable()d */
static int virtnet_busy_poll(struct napi_struct *napi)
{
@@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
virtnet_napi_enable(&vi->rq[i]);
+ napi_enable(&vi->sq[i].napi);
}
return 0;
}
-static int free_old_xmit_skbs(struct send_queue *sq)
-{
- struct sk_buff *skb;
- unsigned int len;
- struct virtnet_info *vi = sq->vq->vdev->priv;
- struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
- u64 tx_bytes = 0, tx_packets = 0;
-
- while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
- pr_debug("Sent skb %p\n", skb);
-
- tx_bytes += skb->len;
- tx_packets++;
-
- dev_kfree_skb_any(skb);
- }
-
- u64_stats_update_begin(&stats->tx_syncp);
- stats->tx_bytes += tx_bytes;
- stats->tx_packets =+ tx_packets;
- u64_stats_update_end(&stats->tx_syncp);
-
- return tx_packets;
-}
-
static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
{
struct skb_vnet_hdr *hdr;
@@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
sg_set_buf(sq->sg, hdr, hdr_len);
num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
}
+
return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
}
@@ -925,10 +961,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int qnum = skb_get_queue_mapping(skb);
struct send_queue *sq = &vi->sq[qnum];
- int err;
+ int err, qsize = virtqueue_get_vring_size(sq->vq);
+ virtqueue_disable_cb(sq->vq);
/* Free up any pending old buffers before queueing new ones. */
- free_old_xmit_skbs(sq);
+ free_old_xmit_skbs(sq, qsize);
/* Try to transmit */
err = xmit_skb(sq, skb);
@@ -944,22 +981,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
- /* Don't wait up for transmitted skbs to be freed. */
- skb_orphan(skb);
- nf_reset(skb);
-
/* Apparently nice girls don't return TX_BUSY; stop the queue
* before it gets out of hand. Naturally, this wastes entries. */
if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
netif_stop_subqueue(dev, qnum);
if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
/* More just got used, free them then recheck. */
- free_old_xmit_skbs(sq);
+ free_old_xmit_skbs(sq, qsize);
if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
netif_start_subqueue(dev, qnum);
virtqueue_disable_cb(sq->vq);
}
}
+ } else if (virtqueue_enable_cb(sq->vq)) {
+ free_old_xmit_skbs(sq, qsize);
}
if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
@@ -1141,8 +1176,10 @@ static int virtnet_close(struct net_device *dev)
/* Make sure refill_work doesn't re-enable napi! */
cancel_delayed_work_sync(&vi->refill);
- for (i = 0; i < vi->max_queue_pairs; i++)
+ for (i = 0; i < vi->max_queue_pairs; i++) {
napi_disable(&vi->rq[i].napi);
+ napi_disable(&vi->sq[i].napi);
+ }
return 0;
}
@@ -1461,8 +1498,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
{
int i;
- for (i = 0; i < vi->max_queue_pairs; i++)
+ for (i = 0; i < vi->max_queue_pairs; i++) {
netif_napi_del(&vi->rq[i].napi);
+ netif_napi_del(&vi->sq[i].napi);
+ }
kfree(vi->rq);
kfree(vi->sq);
@@ -1616,6 +1655,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
napi_weight);
napi_hash_add(&vi->rq[i].napi);
+ netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+ napi_weight);
sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1920,8 +1961,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
if (netif_running(vi->dev)) {
for (i = 0; i < vi->max_queue_pairs; i++) {
napi_disable(&vi->rq[i].napi);
+ napi_disable(&vi->sq[i].napi);
napi_hash_del(&vi->rq[i].napi);
netif_napi_del(&vi->rq[i].napi);
+ netif_napi_del(&vi->sq[i].napi);
}
}
@@ -1946,8 +1989,10 @@ static int virtnet_restore(struct virtio_device *vdev)
if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
schedule_delayed_work(&vi->refill, 0);
- for (i = 0; i < vi->max_queue_pairs; i++)
+ for (i = 0; i < vi->max_queue_pairs; i++) {
virtnet_napi_enable(&vi->rq[i]);
+ napi_enable(&vi->sq[i].napi);
+ }
}
netif_device_attach(vi->dev);
--
1.7.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
2014-10-15 7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
@ 2014-10-15 9:37 ` Eric Dumazet
2014-10-15 10:21 ` Jason Wang
2014-10-15 10:18 ` Michael S. Tsirkin
1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-10-15 9:37 UTC (permalink / raw)
To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, davem
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:
...
> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> + struct sk_buff *skb;
> + unsigned int len;
> + struct virtnet_info *vi = sq->vq->vdev->priv;
> + struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> + u64 tx_bytes = 0, tx_packets = 0;
> +
> + while (tx_packets < budget &&
> + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> + pr_debug("Sent skb %p\n", skb);
> +
> + tx_bytes += skb->len;
> + tx_packets++;
> +
> + dev_kfree_skb_any(skb);
> + }
> +
> + u64_stats_update_begin(&stats->tx_syncp);
> + stats->tx_bytes += tx_bytes;
> + stats->tx_packets =+ tx_packets;
stats->tx_packets += tx_packets;
> + u64_stats_update_end(&stats->tx_syncp);
> +
> + return tx_packets;
> +}
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
2014-10-15 9:37 ` Eric Dumazet
@ 2014-10-15 10:21 ` Jason Wang
0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:21 UTC (permalink / raw)
To: Eric Dumazet; +Cc: mst, netdev, linux-kernel, virtualization, davem
On 10/15/2014 05:37 PM, Eric Dumazet wrote:
> On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:
>
> ...
>
>> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> +{
>> + struct sk_buff *skb;
>> + unsigned int len;
>> + struct virtnet_info *vi = sq->vq->vdev->priv;
>> + struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> + u64 tx_bytes = 0, tx_packets = 0;
>> +
>> + while (tx_packets < budget &&
>> + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> + pr_debug("Sent skb %p\n", skb);
>> +
>> + tx_bytes += skb->len;
>> + tx_packets++;
>> +
>> + dev_kfree_skb_any(skb);
>> + }
>> +
>> + u64_stats_update_begin(&stats->tx_syncp);
>> + stats->tx_bytes += tx_bytes;
>> + stats->tx_packets =+ tx_packets;
>
> stats->tx_packets += tx_packets;
My bad, will correct it in next version.
Thanks
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
2014-10-15 7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
2014-10-15 9:37 ` Eric Dumazet
@ 2014-10-15 10:18 ` Michael S. Tsirkin
2014-10-15 10:25 ` Jason Wang
1 sibling, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:18 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
> Orphan skb in ndo_start_xmit() breaks socket accounting and packet
> queuing. This in fact breaks lots of things such as pktgen and several
> TCP optimizations. And also make BQL can't be implemented for
> virtio-net.
>
> This patch tries to solve this issue by enabling tx interrupt. To
> avoid introducing extra spinlocks, a tx napi was scheduled to free
> those packets.
>
> More tx interrupt mitigation method could be used on top.
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/virtio_net.c | 125 +++++++++++++++++++++++++++++++---------------
> 1 files changed, 85 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ccf98f9..2afc2e2 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -72,6 +72,8 @@ struct send_queue {
>
> /* Name of the send queue: output.$index */
> char name[40];
> +
> + struct napi_struct napi;
> };
>
> /* Internal representation of a receive virtqueue */
> @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> return p;
> }
>
> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> + struct sk_buff *skb;
> + unsigned int len;
> + struct virtnet_info *vi = sq->vq->vdev->priv;
> + struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> + u64 tx_bytes = 0, tx_packets = 0;
> +
> + while (tx_packets < budget &&
> + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> + pr_debug("Sent skb %p\n", skb);
> +
> + tx_bytes += skb->len;
> + tx_packets++;
> +
> + dev_kfree_skb_any(skb);
> + }
> +
> + u64_stats_update_begin(&stats->tx_syncp);
> + stats->tx_bytes += tx_bytes;
> + stats->tx_packets =+ tx_packets;
> + u64_stats_update_end(&stats->tx_syncp);
> +
> + return tx_packets;
> +}
> +
> static void skb_xmit_done(struct virtqueue *vq)
> {
> struct virtnet_info *vi = vq->vdev->priv;
> + struct send_queue *sq = &vi->sq[vq2txq(vq)];
>
> - /* Suppress further interrupts. */
> - virtqueue_disable_cb(vq);
> -
> - /* We were probably waiting for more output buffers. */
> - netif_wake_subqueue(vi->dev, vq2txq(vq));
> + if (napi_schedule_prep(&sq->napi)) {
> + __napi_schedule(&sq->napi);
> + }
> }
>
> static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> @@ -774,7 +801,39 @@ again:
> return received;
> }
>
> +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> +{
> + struct send_queue *sq =
> + container_of(napi, struct send_queue, napi);
> + struct virtnet_info *vi = sq->vq->vdev->priv;
> + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> + unsigned int r, sent = 0;
> +
> +again:
> + __netif_tx_lock(txq, smp_processor_id());
> + virtqueue_disable_cb(sq->vq);
> + sent += free_old_xmit_skbs(sq, budget - sent);
> +
> + if (sent < budget) {
> + r = virtqueue_enable_cb_prepare(sq->vq);
> + napi_complete(napi);
> + __netif_tx_unlock(txq);
> + if (unlikely(virtqueue_poll(sq->vq, r)) &&
So you are enabling callback on the next packet,
which is almost sure to cause an interrupt storm
on the guest.
I think it's a bad idea, this is why I used
enable_cb_delayed in my patch.
> + napi_schedule_prep(napi)) {
> + virtqueue_disable_cb(sq->vq);
> + __napi_schedule(napi);
> + goto again;
> + }
> + } else {
> + __netif_tx_unlock(txq);
> + }
> +
> + netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> + return sent;
> +}
> +
> #ifdef CONFIG_NET_RX_BUSY_POLL
> +
> /* must be called with local_bh_disable()d */
> static int virtnet_busy_poll(struct napi_struct *napi)
> {
> @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
> if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> schedule_delayed_work(&vi->refill, 0);
> virtnet_napi_enable(&vi->rq[i]);
> + napi_enable(&vi->sq[i].napi);
> }
>
> return 0;
> }
>
> -static int free_old_xmit_skbs(struct send_queue *sq)
> -{
> - struct sk_buff *skb;
> - unsigned int len;
> - struct virtnet_info *vi = sq->vq->vdev->priv;
> - struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> - u64 tx_bytes = 0, tx_packets = 0;
> -
> - while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> - pr_debug("Sent skb %p\n", skb);
> -
> - tx_bytes += skb->len;
> - tx_packets++;
> -
> - dev_kfree_skb_any(skb);
> - }
> -
> - u64_stats_update_begin(&stats->tx_syncp);
> - stats->tx_bytes += tx_bytes;
> - stats->tx_packets =+ tx_packets;
> - u64_stats_update_end(&stats->tx_syncp);
> -
> - return tx_packets;
> -}
> -
So you end up moving it all anyway, why bother splitting out
minor changes in previous patches?
> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> {
> struct skb_vnet_hdr *hdr;
> @@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> sg_set_buf(sq->sg, hdr, hdr_len);
> num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
> }
> +
> return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> }
>
> @@ -925,10 +961,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> struct virtnet_info *vi = netdev_priv(dev);
> int qnum = skb_get_queue_mapping(skb);
> struct send_queue *sq = &vi->sq[qnum];
> - int err;
> + int err, qsize = virtqueue_get_vring_size(sq->vq);
>
> + virtqueue_disable_cb(sq->vq);
> /* Free up any pending old buffers before queueing new ones. */
> - free_old_xmit_skbs(sq);
> + free_old_xmit_skbs(sq, qsize);
>
> /* Try to transmit */
> err = xmit_skb(sq, skb);
> @@ -944,22 +981,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> return NETDEV_TX_OK;
> }
>
> - /* Don't wait up for transmitted skbs to be freed. */
> - skb_orphan(skb);
> - nf_reset(skb);
> -
> /* Apparently nice girls don't return TX_BUSY; stop the queue
> * before it gets out of hand. Naturally, this wastes entries. */
> if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> netif_stop_subqueue(dev, qnum);
> if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> /* More just got used, free them then recheck. */
> - free_old_xmit_skbs(sq);
> + free_old_xmit_skbs(sq, qsize);
> if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> netif_start_subqueue(dev, qnum);
> virtqueue_disable_cb(sq->vq);
> }
> }
> + } else if (virtqueue_enable_cb(sq->vq)) {
> + free_old_xmit_skbs(sq, qsize);
> }
>
> if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> @@ -1141,8 +1176,10 @@ static int virtnet_close(struct net_device *dev)
> /* Make sure refill_work doesn't re-enable napi! */
> cancel_delayed_work_sync(&vi->refill);
>
> - for (i = 0; i < vi->max_queue_pairs; i++)
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> napi_disable(&vi->rq[i].napi);
> + napi_disable(&vi->sq[i].napi);
> + }
>
> return 0;
> }
> @@ -1461,8 +1498,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> {
> int i;
>
> - for (i = 0; i < vi->max_queue_pairs; i++)
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> netif_napi_del(&vi->rq[i].napi);
> + netif_napi_del(&vi->sq[i].napi);
> + }
>
> kfree(vi->rq);
> kfree(vi->sq);
> @@ -1616,6 +1655,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
> netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
> napi_weight);
> napi_hash_add(&vi->rq[i].napi);
> + netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
> + napi_weight);
>
> sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
> ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
> @@ -1920,8 +1961,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
> if (netif_running(vi->dev)) {
> for (i = 0; i < vi->max_queue_pairs; i++) {
> napi_disable(&vi->rq[i].napi);
> + napi_disable(&vi->sq[i].napi);
> napi_hash_del(&vi->rq[i].napi);
> netif_napi_del(&vi->rq[i].napi);
> + netif_napi_del(&vi->sq[i].napi);
> }
> }
>
> @@ -1946,8 +1989,10 @@ static int virtnet_restore(struct virtio_device *vdev)
> if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> schedule_delayed_work(&vi->refill, 0);
>
> - for (i = 0; i < vi->max_queue_pairs; i++)
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> virtnet_napi_enable(&vi->rq[i]);
> + napi_enable(&vi->sq[i].napi);
> + }
> }
>
> netif_device_attach(vi->dev);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
2014-10-15 10:18 ` Michael S. Tsirkin
@ 2014-10-15 10:25 ` Jason Wang
2014-10-15 10:43 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
>> > Orphan skb in ndo_start_xmit() breaks socket accounting and packet
>> > queuing. This in fact breaks lots of things such as pktgen and several
>> > TCP optimizations. And also make BQL can't be implemented for
>> > virtio-net.
>> >
>> > This patch tries to solve this issue by enabling tx interrupt. To
>> > avoid introducing extra spinlocks, a tx napi was scheduled to free
>> > those packets.
>> >
>> > More tx interrupt mitigation method could be used on top.
>> >
>> > Cc: Rusty Russell <rusty@rustcorp.com.au>
>> > Cc: Michael S. Tsirkin <mst@redhat.com>
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---
>> > drivers/net/virtio_net.c | 125 +++++++++++++++++++++++++++++++---------------
>> > 1 files changed, 85 insertions(+), 40 deletions(-)
>> >
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index ccf98f9..2afc2e2 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -72,6 +72,8 @@ struct send_queue {
>> >
>> > /* Name of the send queue: output.$index */
>> > char name[40];
>> > +
>> > + struct napi_struct napi;
>> > };
>> >
>> > /* Internal representation of a receive virtqueue */
>> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>> > return p;
>> > }
>> >
>> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> > +{
>> > + struct sk_buff *skb;
>> > + unsigned int len;
>> > + struct virtnet_info *vi = sq->vq->vdev->priv;
>> > + struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> > + u64 tx_bytes = 0, tx_packets = 0;
>> > +
>> > + while (tx_packets < budget &&
>> > + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> > + pr_debug("Sent skb %p\n", skb);
>> > +
>> > + tx_bytes += skb->len;
>> > + tx_packets++;
>> > +
>> > + dev_kfree_skb_any(skb);
>> > + }
>> > +
>> > + u64_stats_update_begin(&stats->tx_syncp);
>> > + stats->tx_bytes += tx_bytes;
>> > + stats->tx_packets =+ tx_packets;
>> > + u64_stats_update_end(&stats->tx_syncp);
>> > +
>> > + return tx_packets;
>> > +}
>> > +
>> > static void skb_xmit_done(struct virtqueue *vq)
>> > {
>> > struct virtnet_info *vi = vq->vdev->priv;
>> > + struct send_queue *sq = &vi->sq[vq2txq(vq)];
>> >
>> > - /* Suppress further interrupts. */
>> > - virtqueue_disable_cb(vq);
>> > -
>> > - /* We were probably waiting for more output buffers. */
>> > - netif_wake_subqueue(vi->dev, vq2txq(vq));
>> > + if (napi_schedule_prep(&sq->napi)) {
>> > + __napi_schedule(&sq->napi);
>> > + }
>> > }
>> >
>> > static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
>> > @@ -774,7 +801,39 @@ again:
>> > return received;
>> > }
>> >
>> > +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>> > +{
>> > + struct send_queue *sq =
>> > + container_of(napi, struct send_queue, napi);
>> > + struct virtnet_info *vi = sq->vq->vdev->priv;
>> > + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>> > + unsigned int r, sent = 0;
>> > +
>> > +again:
>> > + __netif_tx_lock(txq, smp_processor_id());
>> > + virtqueue_disable_cb(sq->vq);
>> > + sent += free_old_xmit_skbs(sq, budget - sent);
>> > +
>> > + if (sent < budget) {
>> > + r = virtqueue_enable_cb_prepare(sq->vq);
>> > + napi_complete(napi);
>> > + __netif_tx_unlock(txq);
>> > + if (unlikely(virtqueue_poll(sq->vq, r)) &&
> So you are enabling callback on the next packet,
> which is almost sure to cause an interrupt storm
> on the guest.
>
>
> I think it's a bad idea, this is why I used
> enable_cb_delayed in my patch.
Right, will do this, but may also need to make sure used event never
goes back since we may call virtqueue_enable_cb_avail().
>
>
>> > + napi_schedule_prep(napi)) {
>> > + virtqueue_disable_cb(sq->vq);
>> > + __napi_schedule(napi);
>> > + goto again;
>> > + }
>> > + } else {
>> > + __netif_tx_unlock(txq);
>> > + }
>> > +
>> > + netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
>> > + return sent;
>> > +}
>> > +
>> > #ifdef CONFIG_NET_RX_BUSY_POLL
>> > +
>> > /* must be called with local_bh_disable()d */
>> > static int virtnet_busy_poll(struct napi_struct *napi)
>> > {
>> > @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
>> > if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>> > schedule_delayed_work(&vi->refill, 0);
>> > virtnet_napi_enable(&vi->rq[i]);
>> > + napi_enable(&vi->sq[i].napi);
>> > }
>> >
>> > return 0;
>> > }
>> >
>> > -static int free_old_xmit_skbs(struct send_queue *sq)
>> > -{
>> > - struct sk_buff *skb;
>> > - unsigned int len;
>> > - struct virtnet_info *vi = sq->vq->vdev->priv;
>> > - struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> > - u64 tx_bytes = 0, tx_packets = 0;
>> > -
>> > - while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> > - pr_debug("Sent skb %p\n", skb);
>> > -
>> > - tx_bytes += skb->len;
>> > - tx_packets++;
>> > -
>> > - dev_kfree_skb_any(skb);
>> > - }
>> > -
>> > - u64_stats_update_begin(&stats->tx_syncp);
>> > - stats->tx_bytes += tx_bytes;
>> > - stats->tx_packets =+ tx_packets;
>> > - u64_stats_update_end(&stats->tx_syncp);
>> > -
>> > - return tx_packets;
>> > -}
>> > -
> So you end up moving it all anyway, why bother splitting out
> minor changes in previous patches?
To make review easier, but if you think this complicated it in fact,
will pack them into a single patch.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
2014-10-15 10:25 ` Jason Wang
@ 2014-10-15 10:43 ` Michael S. Tsirkin
2014-10-15 11:00 ` Jason Wang
0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:43 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 06:25:25PM +0800, Jason Wang wrote:
> On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
> >> > Orphan skb in ndo_start_xmit() breaks socket accounting and packet
> >> > queuing. This in fact breaks lots of things such as pktgen and several
> >> > TCP optimizations. And also make BQL can't be implemented for
> >> > virtio-net.
> >> >
> >> > This patch tries to solve this issue by enabling tx interrupt. To
> >> > avoid introducing extra spinlocks, a tx napi was scheduled to free
> >> > those packets.
> >> >
> >> > More tx interrupt mitigation method could be used on top.
> >> >
> >> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> > ---
> >> > drivers/net/virtio_net.c | 125 +++++++++++++++++++++++++++++++---------------
> >> > 1 files changed, 85 insertions(+), 40 deletions(-)
> >> >
> >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> > index ccf98f9..2afc2e2 100644
> >> > --- a/drivers/net/virtio_net.c
> >> > +++ b/drivers/net/virtio_net.c
> >> > @@ -72,6 +72,8 @@ struct send_queue {
> >> >
> >> > /* Name of the send queue: output.$index */
> >> > char name[40];
> >> > +
> >> > + struct napi_struct napi;
> >> > };
> >> >
> >> > /* Internal representation of a receive virtqueue */
> >> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> >> > return p;
> >> > }
> >> >
> >> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> >> > +{
> >> > + struct sk_buff *skb;
> >> > + unsigned int len;
> >> > + struct virtnet_info *vi = sq->vq->vdev->priv;
> >> > + struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >> > + u64 tx_bytes = 0, tx_packets = 0;
> >> > +
> >> > + while (tx_packets < budget &&
> >> > + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >> > + pr_debug("Sent skb %p\n", skb);
> >> > +
> >> > + tx_bytes += skb->len;
> >> > + tx_packets++;
> >> > +
> >> > + dev_kfree_skb_any(skb);
> >> > + }
> >> > +
> >> > + u64_stats_update_begin(&stats->tx_syncp);
> >> > + stats->tx_bytes += tx_bytes;
> >> > + stats->tx_packets =+ tx_packets;
> >> > + u64_stats_update_end(&stats->tx_syncp);
> >> > +
> >> > + return tx_packets;
> >> > +}
> >> > +
> >> > static void skb_xmit_done(struct virtqueue *vq)
> >> > {
> >> > struct virtnet_info *vi = vq->vdev->priv;
> >> > + struct send_queue *sq = &vi->sq[vq2txq(vq)];
> >> >
> >> > - /* Suppress further interrupts. */
> >> > - virtqueue_disable_cb(vq);
> >> > -
> >> > - /* We were probably waiting for more output buffers. */
> >> > - netif_wake_subqueue(vi->dev, vq2txq(vq));
> >> > + if (napi_schedule_prep(&sq->napi)) {
> >> > + __napi_schedule(&sq->napi);
> >> > + }
> >> > }
> >> >
> >> > static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> >> > @@ -774,7 +801,39 @@ again:
> >> > return received;
> >> > }
> >> >
> >> > +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >> > +{
> >> > + struct send_queue *sq =
> >> > + container_of(napi, struct send_queue, napi);
> >> > + struct virtnet_info *vi = sq->vq->vdev->priv;
> >> > + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> >> > + unsigned int r, sent = 0;
> >> > +
> >> > +again:
> >> > + __netif_tx_lock(txq, smp_processor_id());
> >> > + virtqueue_disable_cb(sq->vq);
> >> > + sent += free_old_xmit_skbs(sq, budget - sent);
> >> > +
> >> > + if (sent < budget) {
> >> > + r = virtqueue_enable_cb_prepare(sq->vq);
> >> > + napi_complete(napi);
> >> > + __netif_tx_unlock(txq);
> >> > + if (unlikely(virtqueue_poll(sq->vq, r)) &&
> > So you are enabling callback on the next packet,
> > which is almost sure to cause an interrupt storm
> > on the guest.
> >
> >
> > I think it's a bad idea, this is why I used
> > enable_cb_delayed in my patch.
>
> Right, will do this, but may also need to make sure used event never
> goes back since we may call virtqueue_enable_cb_avail().
That's why my patch always calls virtqueue_enable_cb_delayed.
So no need for hacks.
Maybe you can review my patch and comment?
> >
> >
> >> > + napi_schedule_prep(napi)) {
> >> > + virtqueue_disable_cb(sq->vq);
> >> > + __napi_schedule(napi);
> >> > + goto again;
> >> > + }
> >> > + } else {
> >> > + __netif_tx_unlock(txq);
> >> > + }
> >> > +
> >> > + netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> >> > + return sent;
> >> > +}
> >> > +
> >> > #ifdef CONFIG_NET_RX_BUSY_POLL
> >> > +
> >> > /* must be called with local_bh_disable()d */
> >> > static int virtnet_busy_poll(struct napi_struct *napi)
> >> > {
> >> > @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
> >> > if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> >> > schedule_delayed_work(&vi->refill, 0);
> >> > virtnet_napi_enable(&vi->rq[i]);
> >> > + napi_enable(&vi->sq[i].napi);
> >> > }
> >> >
> >> > return 0;
> >> > }
> >> >
> >> > -static int free_old_xmit_skbs(struct send_queue *sq)
> >> > -{
> >> > - struct sk_buff *skb;
> >> > - unsigned int len;
> >> > - struct virtnet_info *vi = sq->vq->vdev->priv;
> >> > - struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >> > - u64 tx_bytes = 0, tx_packets = 0;
> >> > -
> >> > - while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >> > - pr_debug("Sent skb %p\n", skb);
> >> > -
> >> > - tx_bytes += skb->len;
> >> > - tx_packets++;
> >> > -
> >> > - dev_kfree_skb_any(skb);
> >> > - }
> >> > -
> >> > - u64_stats_update_begin(&stats->tx_syncp);
> >> > - stats->tx_bytes += tx_bytes;
> >> > - stats->tx_packets =+ tx_packets;
> >> > - u64_stats_update_end(&stats->tx_syncp);
> >> > -
> >> > - return tx_packets;
> >> > -}
> >> > -
> > So you end up moving it all anyway, why bother splitting out
> > minor changes in previous patches?
>
> To make review easier, but if you think this complicated it in fact,
> will pack them into a single patch.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
2014-10-15 10:43 ` Michael S. Tsirkin
@ 2014-10-15 11:00 ` Jason Wang
0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15 11:00 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 06:43 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 06:25:25PM +0800, Jason Wang wrote:
>> > On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote:
>>> > > On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
>>>>> > >> > Orphan skb in ndo_start_xmit() breaks socket accounting and packet
>>>>> > >> > queuing. This in fact breaks lots of things such as pktgen and several
>>>>> > >> > TCP optimizations. And also make BQL can't be implemented for
>>>>> > >> > virtio-net.
>>>>> > >> >
>>>>> > >> > This patch tries to solve this issue by enabling tx interrupt. To
>>>>> > >> > avoid introducing extra spinlocks, a tx napi was scheduled to free
>>>>> > >> > those packets.
>>>>> > >> >
>>>>> > >> > More tx interrupt mitigation method could be used on top.
>>>>> > >> >
>>>>> > >> > Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>> > >> > Cc: Michael S. Tsirkin <mst@redhat.com>
>>>>> > >> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> > >> > ---
>>>>> > >> > drivers/net/virtio_net.c | 125 +++++++++++++++++++++++++++++++---------------
>>>>> > >> > 1 files changed, 85 insertions(+), 40 deletions(-)
>>>>> > >> >
>>>>> > >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> > >> > index ccf98f9..2afc2e2 100644
>>>>> > >> > --- a/drivers/net/virtio_net.c
>>>>> > >> > +++ b/drivers/net/virtio_net.c
>>>>> > >> > @@ -72,6 +72,8 @@ struct send_queue {
>>>>> > >> >
>>>>> > >> > /* Name of the send queue: output.$index */
>>>>> > >> > char name[40];
>>>>> > >> > +
>>>>> > >> > + struct napi_struct napi;
>>>>> > >> > };
>>>>> > >> >
>>>>> > >> > /* Internal representation of a receive virtqueue */
>>>>> > >> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>>>>> > >> > return p;
>>>>> > >> > }
>>>>> > >> >
>>>>> > >> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>>>>> > >> > +{
>>>>> > >> > + struct sk_buff *skb;
>>>>> > >> > + unsigned int len;
>>>>> > >> > + struct virtnet_info *vi = sq->vq->vdev->priv;
>>>>> > >> > + struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>>>> > >> > + u64 tx_bytes = 0, tx_packets = 0;
>>>>> > >> > +
>>>>> > >> > + while (tx_packets < budget &&
>>>>> > >> > + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>>>>> > >> > + pr_debug("Sent skb %p\n", skb);
>>>>> > >> > +
>>>>> > >> > + tx_bytes += skb->len;
>>>>> > >> > + tx_packets++;
>>>>> > >> > +
>>>>> > >> > + dev_kfree_skb_any(skb);
>>>>> > >> > + }
>>>>> > >> > +
>>>>> > >> > + u64_stats_update_begin(&stats->tx_syncp);
>>>>> > >> > + stats->tx_bytes += tx_bytes;
>>>>> > >> > + stats->tx_packets =+ tx_packets;
>>>>> > >> > + u64_stats_update_end(&stats->tx_syncp);
>>>>> > >> > +
>>>>> > >> > + return tx_packets;
>>>>> > >> > +}
>>>>> > >> > +
>>>>> > >> > static void skb_xmit_done(struct virtqueue *vq)
>>>>> > >> > {
>>>>> > >> > struct virtnet_info *vi = vq->vdev->priv;
>>>>> > >> > + struct send_queue *sq = &vi->sq[vq2txq(vq)];
>>>>> > >> >
>>>>> > >> > - /* Suppress further interrupts. */
>>>>> > >> > - virtqueue_disable_cb(vq);
>>>>> > >> > -
>>>>> > >> > - /* We were probably waiting for more output buffers. */
>>>>> > >> > - netif_wake_subqueue(vi->dev, vq2txq(vq));
>>>>> > >> > + if (napi_schedule_prep(&sq->napi)) {
>>>>> > >> > + __napi_schedule(&sq->napi);
>>>>> > >> > + }
>>>>> > >> > }
>>>>> > >> >
>>>>> > >> > static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
>>>>> > >> > @@ -774,7 +801,39 @@ again:
>>>>> > >> > return received;
>>>>> > >> > }
>>>>> > >> >
>>>>> > >> > +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>>>>> > >> > +{
>>>>> > >> > + struct send_queue *sq =
>>>>> > >> > + container_of(napi, struct send_queue, napi);
>>>>> > >> > + struct virtnet_info *vi = sq->vq->vdev->priv;
>>>>> > >> > + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>>>>> > >> > + unsigned int r, sent = 0;
>>>>> > >> > +
>>>>> > >> > +again:
>>>>> > >> > + __netif_tx_lock(txq, smp_processor_id());
>>>>> > >> > + virtqueue_disable_cb(sq->vq);
>>>>> > >> > + sent += free_old_xmit_skbs(sq, budget - sent);
>>>>> > >> > +
>>>>> > >> > + if (sent < budget) {
>>>>> > >> > + r = virtqueue_enable_cb_prepare(sq->vq);
>>>>> > >> > + napi_complete(napi);
>>>>> > >> > + __netif_tx_unlock(txq);
>>>>> > >> > + if (unlikely(virtqueue_poll(sq->vq, r)) &&
>>> > > So you are enabling callback on the next packet,
>>> > > which is almost sure to cause an interrupt storm
>>> > > on the guest.
>>> > >
>>> > >
>>> > > I think it's a bad idea, this is why I used
>>> > > enable_cb_delayed in my patch.
>> >
>> > Right, will do this, but may also need to make sure used event never
>> > goes back since we may call virtqueue_enable_cb_avail().
> That's why my patch always calls virtqueue_enable_cb_delayed.
> So no need for hacks.
>
> Maybe you can review my patch and comment?
I think I miss the virtqueue_enable_cb_delayed() in your patch when I'm
reviewing it. Will do.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
2014-10-15 7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
` (4 preceding siblings ...)
2014-10-15 7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
@ 2014-10-15 7:25 ` Jason Wang
2014-10-15 10:22 ` Michael S. Tsirkin
2014-10-15 10:25 ` [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Michael S. Tsirkin
6 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 7:25 UTC (permalink / raw)
To: rusty, mst, virtualization, linux-kernel, netdev
Cc: davem, eric.dumazet, Jason Wang
With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
enable tx interrupt only for the final skb in the chain if
needed. This will help to mitigate tx interrupts.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2afc2e2..5943f3f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
virtqueue_disable_cb(sq->vq);
}
}
- } else if (virtqueue_enable_cb(sq->vq)) {
- free_old_xmit_skbs(sq, qsize);
}
- if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
+ if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) {
virtqueue_kick(sq->vq);
+ if (sq->vq->num_free >= 2 +MAX_SKB_FRAGS &&
+ unlikely(virtqueue_enable_cb_avail(sq->vq))) {
+ free_old_xmit_skbs(sq, qsize);
+ virtqueue_disable_cb(sq->vq);
+ }
+ }
return NETDEV_TX_OK;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
2014-10-15 7:25 ` [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain Jason Wang
@ 2014-10-15 10:22 ` Michael S. Tsirkin
2014-10-15 10:31 ` Jason Wang
0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:22 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
> With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
> enable tx interrupt only for the final skb in the chain if
> needed. This will help to mitigate tx interrupts.
>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
I think you should split xmit_more stuff out.
> ---
> drivers/net/virtio_net.c | 10 +++++++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2afc2e2..5943f3f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> virtqueue_disable_cb(sq->vq);
> }
> }
> - } else if (virtqueue_enable_cb(sq->vq)) {
> - free_old_xmit_skbs(sq, qsize);
> }
>
> - if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> + if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) {
> virtqueue_kick(sq->vq);
> + if (sq->vq->num_free >= 2 +MAX_SKB_FRAGS &&
> + unlikely(virtqueue_enable_cb_avail(sq->vq))) {
> + free_old_xmit_skbs(sq, qsize);
> + virtqueue_disable_cb(sq->vq);
> + }
> + }
>
> return NETDEV_TX_OK;
> }
> --
> 1.7.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
2014-10-15 10:22 ` Michael S. Tsirkin
@ 2014-10-15 10:31 ` Jason Wang
2014-10-15 10:46 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:31 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 06:22 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
>> With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
>> enable tx interrupt only for the final skb in the chain if
>> needed. This will help to mitigate tx interrupts.
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> I think you should split xmit_more stuff out.
No much difference but if you prefer I will do this.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
2014-10-15 10:31 ` Jason Wang
@ 2014-10-15 10:46 ` Michael S. Tsirkin
0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:46 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 06:31:19PM +0800, Jason Wang wrote:
> On 10/15/2014 06:22 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
> >> With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
> >> enable tx interrupt only for the final skb in the chain if
> >> needed. This will help to mitigate tx interrupts.
> >>
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > I think you should split xmit_more stuff out.
>
> No much difference but if you prefer I will do this.
In fact I did it exactly like this already :)
--
MST
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
2014-10-15 7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
` (5 preceding siblings ...)
2014-10-15 7:25 ` [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain Jason Wang
@ 2014-10-15 10:25 ` Michael S. Tsirkin
2014-10-15 11:14 ` Jason Wang
6 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:25 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
> According to David, proper accounting and queueing (at all levels, not
> just TCP sockets) is more important than trying to skim a bunch of
> cycles by avoiding TX interrupts.
He also mentioned we should find other ways to batch
> Having an event to free the SKB is
> absolutely essential for the stack to operate correctly.
>
> This series tries to enable tx interrupt for virtio-net. The idea is
> simple: enable tx interrupt and schedule a tx napi to free old xmit
> skbs.
>
> Several notes:
> - Tx interrupt storm avoidance when queue is about to be full is
> kept.
But queue is typically *not* full. More important to avoid interrupt
storms in that case IMO.
> Since we may enable callbacks in both ndo_start_xmit() and tx
> napi, patch 1 adds a check to make sure used event never go
> back. This will let the napi not enable the callbacks wrongly after
> delayed callbacks was used.
So why not just use delayed callbacks?
> - For bulk dequeuing, there's no need to enable tx interrupt for each
> packet. The last patch only enable tx interrupt for the final skb in
> the chain through xmit_more and a new helper to publish current avail
> idx as used event.
>
> This series fixes several issues of original rfc pointed out by Michael.
Could you list the issues, for ease of review?
>
> Smoking test is done, and will do complete netperf test. Send the
> seires for early review.
>
> Thanks
>
> Jason Wang (6):
> virtio: make sure used event never go backwards
> virtio: introduce virtio_enable_cb_avail()
> virtio-net: small optimization on free_old_xmit_skbs()
> virtio-net: return the number of packets sent in free_old_xmit_skbs()
> virtio-net: enable tx interrupt
> virtio-net: enable tx interrupt only for the final skb in the chain
>
> drivers/net/virtio_net.c | 125 ++++++++++++++++++++++++++++++------------
> drivers/virtio/virtio_ring.c | 25 ++++++++-
> include/linux/virtio.h | 2 +
> 3 files changed, 115 insertions(+), 37 deletions(-)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
2014-10-15 10:25 ` [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Michael S. Tsirkin
@ 2014-10-15 11:14 ` Jason Wang
2014-10-15 11:58 ` Michael S. Tsirkin
0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 11:14 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
>> According to David, proper accounting and queueing (at all levels, not
>> just TCP sockets) is more important than trying to skim a bunch of
>> cycles by avoiding TX interrupts.
> He also mentioned we should find other ways to batch
>
Right.
>> Having an event to free the SKB is
>> absolutely essential for the stack to operate correctly.
>>
>> This series tries to enable tx interrupt for virtio-net. The idea is
>> simple: enable tx interrupt and schedule a tx napi to free old xmit
>> skbs.
>>
>> Several notes:
>> - Tx interrupt storm avoidance when queue is about to be full is
>> kept.
> But queue is typically *not* full. More important to avoid interrupt
> storms in that case IMO.
Yes.
>> Since we may enable callbacks in both ndo_start_xmit() and tx
>> napi, patch 1 adds a check to make sure used event never go
>> back. This will let the napi not enable the callbacks wrongly after
>> delayed callbacks was used.
> So why not just use delayed callbacks?
This means the tx interrupt are coalesced in a somewhat adaptive way.
Need benchmark to see its effect.
>
>> - For bulk dequeuing, there's no need to enable tx interrupt for each
>> packet. The last patch only enable tx interrupt for the final skb in
>> the chain through xmit_more and a new helper to publish current avail
>> idx as used event.
>>
>> This series fixes several issues of original rfc pointed out by Michael.
> Could you list the issues, for ease of review?
Probably just one:
- Move the virtqueue_disable_cb() from skb_xmit_done() into
virtnet_poll_tx() under tx lock.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
2014-10-15 11:14 ` Jason Wang
@ 2014-10-15 11:58 ` Michael S. Tsirkin
0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 11:58 UTC (permalink / raw)
To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem
On Wed, Oct 15, 2014 at 07:14:14PM +0800, Jason Wang wrote:
> On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
> >> According to David, proper accounting and queueing (at all levels, not
> >> just TCP sockets) is more important than trying to skim a bunch of
> >> cycles by avoiding TX interrupts.
> > He also mentioned we should find other ways to batch
> >
>
> Right.
> >> Having an event to free the SKB is
> >> absolutely essential for the stack to operate correctly.
> >>
> >> This series tries to enable tx interrupt for virtio-net. The idea is
> >> simple: enable tx interrupt and schedule a tx napi to free old xmit
> >> skbs.
> >>
> >> Several notes:
> >> - Tx interrupt storm avoidance when queue is about to be full is
> >> kept.
> > But queue is typically *not* full. More important to avoid interrupt
> > storms in that case IMO.
>
> Yes.
> >> Since we may enable callbacks in both ndo_start_xmit() and tx
> >> napi, patch 1 adds a check to make sure used event never go
> >> back. This will let the napi not enable the callbacks wrongly after
> >> delayed callbacks was used.
> > So why not just use delayed callbacks?
>
> This means the tx interrupt are coalesced in a somewhat adaptive way.
> Need benchmark to see its effect.
I think it's a minimal change, and does not need new APIs.
If that's not optimal, we can do smarter things on top.
> >
> >> - For bulk dequeuing, there's no need to enable tx interrupt for each
> >> packet. The last patch only enable tx interrupt for the final skb in
> >> the chain through xmit_more and a new helper to publish current avail
> >> idx as used event.
> >>
> >> This series fixes several issues of original rfc pointed out by Michael.
> > Could you list the issues, for ease of review?
>
> Probably just one:
>
> - Move the virtqueue_disable_cb() from skb_xmit_done() into
> virtnet_poll_tx() under tx lock.
I think I did this already, I'll recheck.
^ permalink raw reply [flat|nested] 36+ messages in thread