[net-next,RFC] pktgen: don't wait for the device who doesn't free skb immediately after sent
diff mbox series

Message ID 1353916612-6357-1-git-send-email-jasowang@redhat.com
State New, archived
Headers show
Series
  • [net-next,RFC] pktgen: don't wait for the device who doesn't free skb immediately after sent
Related show

Commit Message

Jason Wang Nov. 26, 2012, 7:56 a.m. UTC
Some deivces do not free the old tx skbs immediately after it has been sent
(usually in tx interrupt). One such example is virtio-net which optimizes for
virt and only free the possible old tx skbs during the next packet sending. This
would lead the pktgen to wait forever in the refcount of the skb if no other
pakcet will be sent afterwards.

Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY which could
notify the pktgen that the device does not free skb immediately after it has
been sent and let it not to wait for the refcount to be one.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---

Another choice is to introduce an new .ndo which could be called by pktgen
during the waiting to free the old tx skbs.

 drivers/net/virtio_net.c |    3 ++-
 include/uapi/linux/if.h  |    2 ++
 net/core/pktgen.c        |    3 ++-
 3 files changed, 6 insertions(+), 2 deletions(-)

Comments

Stephen Hemminger Nov. 26, 2012, 5:37 p.m. UTC | #1
On Mon, 26 Nov 2012 15:56:52 +0800
Jason Wang <jasowang@redhat.com> wrote:

> Some deivces do not free the old tx skbs immediately after it has been sent
> (usually in tx interrupt). One such example is virtio-net which optimizes for
> virt and only free the possible old tx skbs during the next packet sending. This
> would lead the pktgen to wait forever in the refcount of the skb if no other
> pakcet will be sent afterwards.
> 
> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY which could
> notify the pktgen that the device does not free skb immediately after it has
> been sent and let it not to wait for the refcount to be one.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Another alternative would be using skb_orphan() and skb->destructor.
There are other cases where skb's are not freed right away.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang Nov. 27, 2012, 6:45 a.m. UTC | #2
On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> On Mon, 26 Nov 2012 15:56:52 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> Some deivces do not free the old tx skbs immediately after it has been sent
>> (usually in tx interrupt). One such example is virtio-net which optimizes for
>> virt and only free the possible old tx skbs during the next packet sending. This
>> would lead the pktgen to wait forever in the refcount of the skb if no other
>> pakcet will be sent afterwards.
>>
>> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY which could
>> notify the pktgen that the device does not free skb immediately after it has
>> been sent and let it not to wait for the refcount to be one.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Another alternative would be using skb_orphan() and skb->destructor.
> There are other cases where skb's are not freed right away.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hi Stephen:

Do you mean registering a skb->destructor for pktgen then set and check 
bits in skb->tx_flag?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stephen Hemminger Nov. 27, 2012, 4:49 p.m. UTC | #3
On Tue, 27 Nov 2012 14:45:13 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > On Mon, 26 Nov 2012 15:56:52 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >
> >> Some deivces do not free the old tx skbs immediately after it has been sent
> >> (usually in tx interrupt). One such example is virtio-net which optimizes for
> >> virt and only free the possible old tx skbs during the next packet sending. This
> >> would lead the pktgen to wait forever in the refcount of the skb if no other
> >> pakcet will be sent afterwards.
> >>
> >> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY which could
> >> notify the pktgen that the device does not free skb immediately after it has
> >> been sent and let it not to wait for the refcount to be one.
> >>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > Another alternative would be using skb_orphan() and skb->destructor.
> > There are other cases where skb's are not freed right away.
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Hi Stephen:
> 
> Do you mean registering a skb->destructor for pktgen then set and check 
> bits in skb->tx_flag?

Yes. Register a destructor that does something like update a counter (number of packets pending),
then just spin while number of packets pending is over threshold.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang Nov. 28, 2012, 6:48 a.m. UTC | #4
On 11/28/2012 12:49 AM, Stephen Hemminger wrote:
> On Tue, 27 Nov 2012 14:45:13 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
>>> On Mon, 26 Nov 2012 15:56:52 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>
>>>> Some deivces do not free the old tx skbs immediately after it has been sent
>>>> (usually in tx interrupt). One such example is virtio-net which optimizes for
>>>> virt and only free the possible old tx skbs during the next packet sending. This
>>>> would lead the pktgen to wait forever in the refcount of the skb if no other
>>>> pakcet will be sent afterwards.
>>>>
>>>> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY which could
>>>> notify the pktgen that the device does not free skb immediately after it has
>>>> been sent and let it not to wait for the refcount to be one.
>>>>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> Another alternative would be using skb_orphan() and skb->destructor.
>>> There are other cases where skb's are not freed right away.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Hi Stephen:
>>
>> Do you mean registering a skb->destructor for pktgen then set and check
>> bits in skb->tx_flag?
> Yes. Register a destructor that does something like update a counter (number of packets pending),
> then just spin while number of packets pending is over threshold.
> --

Not sure this is the best method, since pktgen was used to test the tx 
process of the device driver and NIC. If we use skb_orhpan(), we would 
miss the test of tx completion part.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stephen Hemminger Nov. 28, 2012, 4:53 p.m. UTC | #5
On Wed, 28 Nov 2012 14:48:52 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 11/28/2012 12:49 AM, Stephen Hemminger wrote:
> > On Tue, 27 Nov 2012 14:45:13 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >
> >> On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> >>> On Mon, 26 Nov 2012 15:56:52 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>
> >>>> Some deivces do not free the old tx skbs immediately after it has been sent
> >>>> (usually in tx interrupt). One such example is virtio-net which optimizes for
> >>>> virt and only free the possible old tx skbs during the next packet sending. This
> >>>> would lead the pktgen to wait forever in the refcount of the skb if no other
> >>>> pakcet will be sent afterwards.
> >>>>
> >>>> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY which could
> >>>> notify the pktgen that the device does not free skb immediately after it has
> >>>> been sent and let it not to wait for the refcount to be one.
> >>>>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>> Another alternative would be using skb_orphan() and skb->destructor.
> >>> There are other cases where skb's are not freed right away.
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Hi Stephen:
> >>
> >> Do you mean registering a skb->destructor for pktgen then set and check
> >> bits in skb->tx_flag?
> > Yes. Register a destructor that does something like update a counter (number of packets pending),
> > then just spin while number of packets pending is over threshold.
> > --
> 
> Not sure this is the best method, since pktgen was used to test the tx 
> process of the device driver and NIC. If we use skb_orhpan(), we would 
> miss the test of tx completion part.

There are other places that delay freeing and your solution would mean finding
and fixing all those. Code that does that already has to use skb_orphan() to
work, and I was looking for a way that could use that. Introducing another flag
value seems like a long term burden.

Alternatively, virtio could do cleanup more aggressively. Maybe in response to
ring getting half full, or add a cleanup timer or something to avoid the problem.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang Nov. 29, 2012, 10:13 a.m. UTC | #6
On Wednesday, November 28, 2012 08:53:05 AM Stephen Hemminger wrote:
> On Wed, 28 Nov 2012 14:48:52 +0800
> 
> Jason Wang <jasowang@redhat.com> wrote:
> > On 11/28/2012 12:49 AM, Stephen Hemminger wrote:
> > > On Tue, 27 Nov 2012 14:45:13 +0800
> > > 
> > > Jason Wang <jasowang@redhat.com> wrote:
> > >> On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > >>> On Mon, 26 Nov 2012 15:56:52 +0800
> > >>> 
> > >>> Jason Wang <jasowang@redhat.com> wrote:
> > >>>> Some deivces do not free the old tx skbs immediately after it has
> > >>>> been sent
> > >>>> (usually in tx interrupt). One such example is virtio-net which
> > >>>> optimizes for virt and only free the possible old tx skbs during the
> > >>>> next packet sending. This would lead the pktgen to wait forever in
> > >>>> the refcount of the skb if no other pakcet will be sent afterwards.
> > >>>> 
> > >>>> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > >>>> which could notify the pktgen that the device does not free skb
> > >>>> immediately after it has been sent and let it not to wait for the
> > >>>> refcount to be one.
> > >>>> 
> > >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >>> 
> > >>> Another alternative would be using skb_orphan() and skb->destructor.
> > >>> There are other cases where skb's are not freed right away.
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> > >>> the body of a message to majordomo@vger.kernel.org
> > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> 
> > >> Hi Stephen:
> > >> 
> > >> Do you mean registering a skb->destructor for pktgen then set and check
> > >> bits in skb->tx_flag?
> > > 
> > > Yes. Register a destructor that does something like update a counter
> > > (number of packets pending), then just spin while number of packets
> > > pending is over threshold.
> > > --
> > 
> > Not sure this is the best method, since pktgen was used to test the tx
> > process of the device driver and NIC. If we use skb_orhpan(), we would
> > miss the test of tx completion part.
> 
> There are other places that delay freeing and your solution would mean
> finding and fixing all those. Code that does that already has to use
> skb_orphan() to work, and I was looking for a way that could use that.
> Introducing another flag value seems like a long term burden.
> 

Get the point, will draft another version.
> Alternatively, virtio could do cleanup more aggressively. Maybe in response
> to ring getting half full, or add a cleanup timer or something to avoid the
> problem.

May worth to try. Another method is that virtio has a feature to notify guest 
when tx ring is empty, we could free the old tx skbs there. But it may brings 
extra overhead. If we could let virtio_net free the old tx skb timely, it 
would be easier to bring BQL support to virtio_net also.

Thanks
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Michael S. Tsirkin Nov. 29, 2012, 11:30 a.m. UTC | #7
On Thu, Nov 29, 2012 at 06:13:13PM +0800, Jason Wang wrote:
> On Wednesday, November 28, 2012 08:53:05 AM Stephen Hemminger wrote:
> > On Wed, 28 Nov 2012 14:48:52 +0800
> > 
> > Jason Wang <jasowang@redhat.com> wrote:
> > > On 11/28/2012 12:49 AM, Stephen Hemminger wrote:
> > > > On Tue, 27 Nov 2012 14:45:13 +0800
> > > > 
> > > > Jason Wang <jasowang@redhat.com> wrote:
> > > >> On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > >>> On Mon, 26 Nov 2012 15:56:52 +0800
> > > >>> 
> > > >>> Jason Wang <jasowang@redhat.com> wrote:
> > > >>>> Some deivces do not free the old tx skbs immediately after it has
> > > >>>> been sent
> > > >>>> (usually in tx interrupt). One such example is virtio-net which
> > > >>>> optimizes for virt and only free the possible old tx skbs during the
> > > >>>> next packet sending. This would lead the pktgen to wait forever in
> > > >>>> the refcount of the skb if no other pakcet will be sent afterwards.
> > > >>>> 
> > > >>>> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > > >>>> which could notify the pktgen that the device does not free skb
> > > >>>> immediately after it has been sent and let it not to wait for the
> > > >>>> refcount to be one.
> > > >>>> 
> > > >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >>> 
> > > >>> Another alternative would be using skb_orphan() and skb->destructor.
> > > >>> There are other cases where skb's are not freed right away.
> > > >>> --
> > > >>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > >>> the body of a message to majordomo@vger.kernel.org
> > > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > >> 
> > > >> Hi Stephen:
> > > >> 
> > > >> Do you mean registering a skb->destructor for pktgen then set and check
> > > >> bits in skb->tx_flag?
> > > > 
> > > > Yes. Register a destructor that does something like update a counter
> > > > (number of packets pending), then just spin while number of packets
> > > > pending is over threshold.
> > > > --
> > > 
> > > Not sure this is the best method, since pktgen was used to test the tx
> > > process of the device driver and NIC. If we use skb_orhpan(), we would
> > > miss the test of tx completion part.
> > 
> > There are other places that delay freeing and your solution would mean
> > finding and fixing all those. Code that does that already has to use
> > skb_orphan() to work, and I was looking for a way that could use that.
> > Introducing another flag value seems like a long term burden.
> > 
> 
> Get the point, will draft another version.
>
> > Alternatively, virtio could do cleanup more aggressively. Maybe in response
> > to ring getting half full, or add a cleanup timer or something to avoid the
> > problem.

Timer would prevent complete deadlock but it is very expensive
in the virt scenario.
pulling at ring half full would only help if ring gets half full :)
which it does not have to.

> 
> May worth to try. Another method is that virtio has a feature to notify guest 
> when tx ring is empty, we could free the old tx skbs there.
> But it may brings 
> extra overhead. If we could let virtio_net free the old tx skb timely, it 
> would be easier to bring BQL support to virtio_net also.
> 
> Thanks

We used to use notify on empty - it's still very slow.

> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stephen Hemminger Nov. 29, 2012, 5:21 p.m. UTC | #8
On Thu, 29 Nov 2012 13:30:13 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Nov 29, 2012 at 06:13:13PM +0800, Jason Wang wrote:
> > On Wednesday, November 28, 2012 08:53:05 AM Stephen Hemminger wrote:
> > > On Wed, 28 Nov 2012 14:48:52 +0800
> > > 
> > > Jason Wang <jasowang@redhat.com> wrote:
> > > > On 11/28/2012 12:49 AM, Stephen Hemminger wrote:
> > > > > On Tue, 27 Nov 2012 14:45:13 +0800
> > > > > 
> > > > > Jason Wang <jasowang@redhat.com> wrote:
> > > > >> On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > > >>> On Mon, 26 Nov 2012 15:56:52 +0800
> > > > >>> 
> > > > >>> Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>> Some deivces do not free the old tx skbs immediately after it has
> > > > >>>> been sent
> > > > >>>> (usually in tx interrupt). One such example is virtio-net which
> > > > >>>> optimizes for virt and only free the possible old tx skbs during the
> > > > >>>> next packet sending. This would lead the pktgen to wait forever in
> > > > >>>> the refcount of the skb if no other pakcet will be sent afterwards.
> > > > >>>> 
> > > > >>>> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > > > >>>> which could notify the pktgen that the device does not free skb
> > > > >>>> immediately after it has been sent and let it not to wait for the
> > > > >>>> refcount to be one.
> > > > >>>> 
> > > > >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > >>> 
> > > > >>> Another alternative would be using skb_orphan() and skb->destructor.
> > > > >>> There are other cases where skb's are not freed right away.
> > > > >>> --
> > > > >>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > >>> the body of a message to majordomo@vger.kernel.org
> > > > >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > >> 
> > > > >> Hi Stephen:
> > > > >> 
> > > > >> Do you mean registering a skb->destructor for pktgen then set and check
> > > > >> bits in skb->tx_flag?
> > > > > 
> > > > > Yes. Register a destructor that does something like update a counter
> > > > > (number of packets pending), then just spin while number of packets
> > > > > pending is over threshold.
> > > > > --
> > > > 
> > > > Not sure this is the best method, since pktgen was used to test the tx
> > > > process of the device driver and NIC. If we use skb_orhpan(), we would
> > > > miss the test of tx completion part.
> > > 
> > > There are other places that delay freeing and your solution would mean
> > > finding and fixing all those. Code that does that already has to use
> > > skb_orphan() to work, and I was looking for a way that could use that.
> > > Introducing another flag value seems like a long term burden.
> > > 
> > 
> > Get the point, will draft another version.
> >
> > > Alternatively, virtio could do cleanup more aggressively. Maybe in response
> > > to ring getting half full, or add a cleanup timer or something to avoid the
> > > problem.
> 
> Timer would prevent complete deadlock but it is very expensive
> in the virt scenario.
> pulling at ring half full would only help if ring gets half full :)
> which it does not have to.

A timer that fires once (when idle) to mop up the last packet in a stream
would be not very expensive. Most of the time, it would just be continually
rescheduled.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang Dec. 3, 2012, 6:45 a.m. UTC | #9
On Tuesday, November 27, 2012 08:49:19 AM Stephen Hemminger wrote:
> On Tue, 27 Nov 2012 14:45:13 +0800
> 
> Jason Wang <jasowang@redhat.com> wrote:
> > On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > On Mon, 26 Nov 2012 15:56:52 +0800
> > > 
> > > Jason Wang <jasowang@redhat.com> wrote:
> > >> Some deivces do not free the old tx skbs immediately after it has been
> > >> sent
> > >> (usually in tx interrupt). One such example is virtio-net which
> > >> optimizes for virt and only free the possible old tx skbs during the
> > >> next packet sending. This would lead the pktgen to wait forever in the
> > >> refcount of the skb if no other pakcet will be sent afterwards.
> > >> 
> > >> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > >> which could notify the pktgen that the device does not free skb
> > >> immediately after it has been sent and let it not to wait for the
> > >> refcount to be one.
> > >> 
> > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > 
> > > Another alternative would be using skb_orphan() and skb->destructor.
> > > There are other cases where skb's are not freed right away.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > Hi Stephen:
> > 
> > Do you mean registering a skb->destructor for pktgen then set and check
> > bits in skb->tx_flag?
> 
> Yes. Register a destructor that does something like update a counter (number
> of packets pending), then just spin while number of packets pending is over
> threshold.

Have some experiments on this, looks like it does not work weel when clone_skb 
is used. For driver that call skb_orphan() in ndo_start_xmit, the destructor 
is only called when the first packet were sent, but what we need to know is 
when the last were sent. Any thoughts on this or we can just introduce another 
flag (anyway we have something like IFF_TX_SKB_SHARING) ?

Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stephen Hemminger Dec. 3, 2012, 4:01 p.m. UTC | #10
On Mon, 03 Dec 2012 14:45:46 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On Tuesday, November 27, 2012 08:49:19 AM Stephen Hemminger wrote:
> > On Tue, 27 Nov 2012 14:45:13 +0800
> > 
> > Jason Wang <jasowang@redhat.com> wrote:
> > > On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > > On Mon, 26 Nov 2012 15:56:52 +0800
> > > > 
> > > > Jason Wang <jasowang@redhat.com> wrote:
> > > >> Some deivces do not free the old tx skbs immediately after it has been
> > > >> sent
> > > >> (usually in tx interrupt). One such example is virtio-net which
> > > >> optimizes for virt and only free the possible old tx skbs during the
> > > >> next packet sending. This would lead the pktgen to wait forever in the
> > > >> refcount of the skb if no other pakcet will be sent afterwards.
> > > >> 
> > > >> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > > >> which could notify the pktgen that the device does not free skb
> > > >> immediately after it has been sent and let it not to wait for the
> > > >> refcount to be one.
> > > >> 
> > > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > 
> > > > Another alternative would be using skb_orphan() and skb->destructor.
> > > > There are other cases where skb's are not freed right away.
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > Hi Stephen:
> > > 
> > > Do you mean registering a skb->destructor for pktgen then set and check
> > > bits in skb->tx_flag?
> > 
> > Yes. Register a destructor that does something like update a counter (number
> > of packets pending), then just spin while number of packets pending is over
> > threshold.
> 
> Have some experiments on this, looks like it does not work weel when clone_skb 
> is used. For driver that call skb_orphan() in ndo_start_xmit, the destructor 
> is only called when the first packet were sent, but what we need to know is 
> when the last were sent. Any thoughts on this or we can just introduce another 
> flag (anyway we have something like IFF_TX_SKB_SHARING) ?
> 

The SKB_SHARING flag looks like the best solution then.
Surprisingly, transmit buffer completion is a major bottleneck for 10G
devices, and I suspect more changes will come.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jason Wang Dec. 4, 2012, 12:55 p.m. UTC | #11
On Monday, December 03, 2012 08:01:11 AM Stephen Hemminger wrote:
> On Mon, 03 Dec 2012 14:45:46 +0800
> 
> Jason Wang <jasowang@redhat.com> wrote:
> > On Tuesday, November 27, 2012 08:49:19 AM Stephen Hemminger wrote:
> > > On Tue, 27 Nov 2012 14:45:13 +0800
> > > 
> > > Jason Wang <jasowang@redhat.com> wrote:
> > > > On 11/27/2012 01:37 AM, Stephen Hemminger wrote:
> > > > > On Mon, 26 Nov 2012 15:56:52 +0800
> > > > > 
> > > > > Jason Wang <jasowang@redhat.com> wrote:
> > > > >> Some deivces do not free the old tx skbs immediately after it has
> > > > >> been
> > > > >> sent
> > > > >> (usually in tx interrupt). One such example is virtio-net which
> > > > >> optimizes for virt and only free the possible old tx skbs during
> > > > >> the
> > > > >> next packet sending. This would lead the pktgen to wait forever in
> > > > >> the
> > > > >> refcount of the skb if no other pakcet will be sent afterwards.
> > > > >> 
> > > > >> Solving this issue by introducing a new flag IFF_TX_SKB_FREE_DELAY
> > > > >> which could notify the pktgen that the device does not free skb
> > > > >> immediately after it has been sent and let it not to wait for the
> > > > >> refcount to be one.
> > > > >> 
> > > > >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > 
> > > > > Another alternative would be using skb_orphan() and skb->destructor.
> > > > > There are other cases where skb's are not freed right away.
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > > Hi Stephen:
> > > > 
> > > > Do you mean registering a skb->destructor for pktgen then set and
> > > > check
> > > > bits in skb->tx_flag?
> > > 
> > > Yes. Register a destructor that does something like update a counter
> > > (number of packets pending), then just spin while number of packets
> > > pending is over threshold.
> > 
> > Have some experiments on this, looks like it does not work weel when
> > clone_skb is used. For driver that call skb_orphan() in ndo_start_xmit,
> > the destructor is only called when the first packet were sent, but what
> > we need to know is when the last were sent. Any thoughts on this or we
> > can just introduce another flag (anyway we have something like
> > IFF_TX_SKB_SHARING) ?
> 
> The SKB_SHARING flag looks like the best solution then.
> Surprisingly, transmit buffer completion is a major bottleneck for 10G
> devices, and I suspect more changes will come.

It works, but we may lose some chances to use clone_skb and stress the device 
and driver more. I'm thinking maybe we can turn back to my original RFC to 
introduce another flag. This flag maybe also useful for BQL and zerocopy in the 
future since both of them are sensitive to the transmit buffer completion. 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 26c502e..8262232 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1058,7 +1058,8 @@  static int virtnet_probe(struct virtio_device *vdev)
 		return -ENOMEM;
 
 	/* Set up network device as normal. */
-	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE;
+	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE |
+			   IFF_TX_SKB_FREE_DELAY;
 	dev->netdev_ops = &virtnet_netdev;
 	dev->features = NETIF_F_HIGHDMA;
 
diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 1ec407b..a0d2168 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -83,6 +83,8 @@ 
 #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
 #define IFF_LIVE_ADDR_CHANGE 0x100000	/* device supports hardware address
 					 * change when it's running */
+#define IFF_TX_SKB_FREE_DELAY 0x200000	/* device does free the tx skb
+					 * immediately when it is sent */
 
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index b29dacf..85d4e53 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3269,7 +3269,8 @@  unlock:
 
 	/* If pkt_dev->count is zero, then run forever */
 	if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) {
-		pktgen_wait_for_skb(pkt_dev);
+		if (!(pkt_dev->odev->priv_flags & IFF_TX_SKB_FREE_DELAY))
+			pktgen_wait_for_skb(pkt_dev);
 
 		/* Done with this */
 		pktgen_stop_device(pkt_dev);