All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
@ 2014-08-23 20:28 David Miller
  2014-08-23 23:25 ` Alexander Duyck
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: David Miller @ 2014-08-23 20:28 UTC (permalink / raw)
  To: netdev; +Cc: therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty


Over time, and specifically and more recently at the Networking
Workshop during Kernel SUmmit in Chicago, we have discussed the idea
of having some way to optimize transmits of multiple TX packets at
a time.

There are several areas of overhead that could be amortized with such
schemes.  One has to do with locking and transactional overhead, the
other has to do with device specific costs.

This patch set here is more aimed at device specific costs.

Typically a device queues up a packet in the TX queue and then has to
do something to have the device start processing that new entry.
Sometimes this is composed of doing an MMIO write to a "tail"
register, and in other cases it can involve something as expensive as
a hypervisor call.

The basic setup defined here is that when the driver supports deferred
TX queue flushing, ndo_start_xmit should no longer perform that
operation.  Instead a new operation, ndo_xmit_flush, should do it.

I have converted IGB and virtio_net as example initial users.  The IGB
conversion is tested, virtio_net is not but it does compile :-)

All ndo_start_xmit call sites have been abstracted behind a new helper
called netdev_start_xmit().

This just adds the infrastructure, it does not actually add any
instances of actually doing multiple ndo_start_xmit calls per
ndo_xmit_flush invocation.

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-23 20:28 [PATCH 0/3] Basic deferred TX queue flushing infrastructure David Miller
@ 2014-08-23 23:25 ` Alexander Duyck
  2014-08-24 12:58   ` Jesper Dangaard Brouer
  2014-08-24  3:39 ` Tom Herbert
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Alexander Duyck @ 2014-08-23 23:25 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

On 08/23/2014 01:28 PM, David Miller wrote:
> 
> Over time, and specifically and more recently at the Networking
> Workshop during Kernel SUmmit in Chicago, we have discussed the idea
> of having some way to optimize transmits of multiple TX packets at
> a time.
> 
> There are several areas of overhead that could be amortized with such
> schemes.  One has to do with locking and transactional overhead, the
> other has to do with device specific costs.
> 
> This patch set here is more aimed at device specific costs.
> 
> Typically a device queues up a packet in the TX queue and then has to
> do something to have the device start processing that new entry.
> Sometimes this is composed of doing an MMIO write to a "tail"
> register, and in other cases it can involve something as expensive as
> a hypervisor call.

The MMIO call isn't an issue until you encounter a locked operation, at
least on x86 architecture.  So this often shows up in perf traces as a
hit on the qdisc lock right after completing a transmit.  I've seen it
at around 20% of CPU utilization when I was doing routing work with ixgbe.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-23 20:28 [PATCH 0/3] Basic deferred TX queue flushing infrastructure David Miller
  2014-08-23 23:25 ` Alexander Duyck
@ 2014-08-24  3:39 ` Tom Herbert
  2014-08-24  4:26   ` David Miller
  2014-08-24  4:38 ` David Miller
  2014-08-25  6:10 ` David Miller
  3 siblings, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2014-08-24  3:39 UTC (permalink / raw)
  To: David Miller
  Cc: Linux Netdev List, Jamal Hadi Salim, Hannes Frederic Sowa,
	Eric Dumazet, Jeff Kirsher, Rusty Russell

On Sat, Aug 23, 2014 at 1:28 PM, David Miller <davem@davemloft.net> wrote:
>
>
> Over time, and specifically and more recently at the Networking
> Workshop during Kernel SUmmit in Chicago, we have discussed the idea
> of having some way to optimize transmits of multiple TX packets at
> a time.
>
> There are several areas of overhead that could be amortized with such
> schemes.  One has to do with locking and transactional overhead, the
> other has to do with device specific costs.
>
> This patch set here is more aimed at device specific costs.
>
> Typically a device queues up a packet in the TX queue and then has to
> do something to have the device start processing that new entry.
> Sometimes this is composed of doing an MMIO write to a "tail"
> register, and in other cases it can involve something as expensive as
> a hypervisor call.
>
> The basic setup defined here is that when the driver supports deferred
> TX queue flushing, ndo_start_xmit should no longer perform that
> operation.  Instead a new operation, ndo_xmit_flush, should do it.
>
Interesting, but I'm still wondering about MSG_MORE idea. If that is
passed with ndo_start_xmit than maybe a new ndo function might not be
needed. Also, since it's advisory, we could pass it to existing
drivers and let them decide how rather to do anything with it.

> I have converted IGB and virtio_net as example initial users.  The IGB
> conversion is tested, virtio_net is not but it does compile :-)
>
> All ndo_start_xmit call sites have been abstracted behind a new helper
> called netdev_start_xmit().
>
> This just adds the infrastructure, it does not actually add any
> instances of actually doing multiple ndo_start_xmit calls per
> ndo_xmit_flush invocation.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-24  3:39 ` Tom Herbert
@ 2014-08-24  4:26   ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2014-08-24  4:26 UTC (permalink / raw)
  To: therbert; +Cc: netdev, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

From: Tom Herbert <therbert@google.com>
Date: Sat, 23 Aug 2014 20:39:05 -0700

> Interesting, but I'm still wondering about MSG_MORE idea. If that is
> passed with ndo_start_xmit than maybe a new ndo function might not be
> needed. Also, since it's advisory, we could pass it to existing
> drivers and let them decide how rather to do anything with it.

My thinking is that if we can have the flush generation logic in a
generic place, we should.

We really do not want every driver to have a unique approach to this
stuff.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-23 20:28 [PATCH 0/3] Basic deferred TX queue flushing infrastructure David Miller
  2014-08-23 23:25 ` Alexander Duyck
  2014-08-24  3:39 ` Tom Herbert
@ 2014-08-24  4:38 ` David Miller
  2014-08-24 14:57   ` Jamal Hadi Salim
                     ` (2 more replies)
  2014-08-25  6:10 ` David Miller
  3 siblings, 3 replies; 19+ messages in thread
From: David Miller @ 2014-08-24  4:38 UTC (permalink / raw)
  To: netdev; +Cc: therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

From: David Miller <davem@davemloft.net>
Date: Sat, 23 Aug 2014 13:28:11 -0700 (PDT)

> This just adds the infrastructure, it does not actually add any
> instances of actually doing multiple ndo_start_xmit calls per
> ndo_xmit_flush invocation.

So today I was looking more into this aspect.

Like Tom Herbert has mentioned we have all the infrastructure (sort
of) already to handle a list of SKBs going down into
dev_hard_start_xmit() via the GSO handling.

But that code is funny, because it keeps the original GSO head SKB
around as a placeholder to maintain the list of segmented SKBs.

So the list walker basically walks starting at skb->next.  That's
awkward for what we want to do, which is pass in an arbitrary list of
SKBs.

All it really wants that head SKB for is essentially list management,
which seems like overkill to me.

Anyways, this got me thinking that we should have something that
provides the segment list management and stop keeping that head GSO
SKB around.

Then we can make that "gso:" label list walker generic enough that we
could pass down arbitrary lists of SKBs from the qdisc_restart() path.

This list management seems to be the only reason why we keep the GSO
head SKB around after dev_gso_segment(), we should be able to free it
up early without any problems right?

I'm also thinking about whether we should hang the generic SKB list
management off of the txq or the qdisc.  Right now the gso_skb thing
is in the qdisc.

Thoughts?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-23 23:25 ` Alexander Duyck
@ 2014-08-24 12:58   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2014-08-24 12:58 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: brouer, David Miller, netdev, therbert, jhs, hannes, edumazet,
	jeffrey.t.kirsher, rusty


On Sat, 23 Aug 2014 16:25:05 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On 08/23/2014 01:28 PM, David Miller wrote:
> > 
[...]
> > Typically a device queues up a packet in the TX queue and then has to
> > do something to have the device start processing that new entry.
> > Sometimes this is composed of doing an MMIO write to a "tail"
> > register, and in other cases it can involve something as expensive as
> > a hypervisor call.
> 
> The MMIO call isn't an issue until you encounter a locked operation, at
> least on x86 architecture.  So this often shows up in perf traces as a
> hit on the qdisc lock right after completing a transmit.  I've seen it
> at around 20% of CPU utilization when I was doing routing work with ixgbe.

My experience is that perf cannot measure these MMIO writes correctly.
Perf will often blame the lock operations in the qdisc system, but
playing with removing these qdisc and TXQ locks, I found that perf
would start blaming some asm inst that did not make sense (adjusting
the code, would make perf blame another asm inst that didn't make sense).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-24  4:38 ` David Miller
@ 2014-08-24 14:57   ` Jamal Hadi Salim
  2014-08-24 19:08     ` David Miller
  2014-08-24 17:37   ` Eric Dumazet
  2014-08-25 22:21   ` Cong Wang
  2 siblings, 1 reply; 19+ messages in thread
From: Jamal Hadi Salim @ 2014-08-24 14:57 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: therbert, hannes, edumazet, jeffrey.t.kirsher, rusty

On 08/24/14 00:38, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Sat, 23 Aug 2014 13:28:11 -0700 (PDT)
>
>> This just adds the infrastructure, it does not actually add any
>> instances of actually doing multiple ndo_start_xmit calls per
>> ndo_xmit_flush invocation.
>
> So today I was looking more into this aspect.
>
> Like Tom Herbert has mentioned we have all the infrastructure (sort
> of) already to handle a list of SKBs going down into
> dev_hard_start_xmit() via the GSO handling.
>
> But that code is funny, because it keeps the original GSO head SKB
> around as a placeholder to maintain the list of segmented SKBs.
>
> So the list walker basically walks starting at skb->next.  That's
> awkward for what we want to do, which is pass in an arbitrary list of
> SKBs.
>
> All it really wants that head SKB for is essentially list management,
> which seems like overkill to me.
>
> Anyways, this got me thinking that we should have something that
> provides the segment list management and stop keeping that head GSO
> SKB around.
>
> Then we can make that "gso:" label list walker generic enough that we
> could pass down arbitrary lists of SKBs from the qdisc_restart() path.
>
> This list management seems to be the only reason why we keep the GSO
> head SKB around after dev_gso_segment(), we should be able to free it
> up early without any problems right?
>
> I'm also thinking about whether we should hang the generic SKB list
> management off of the txq or the qdisc.  Right now the gso_skb thing
> is in the qdisc.
>
> Thoughts?

I think leave it where it is (it messes with accounting otherwise)
and eventually you kill it when you have the batch interface right.

Clearly, the best thing to do is avoid altogether the need to have such
a list. The only reason you need a list around is because you dont know
how much you can send to the driver. tx_win idea may not be the best
but it tried to address that issue.

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-24  4:38 ` David Miller
  2014-08-24 14:57   ` Jamal Hadi Salim
@ 2014-08-24 17:37   ` Eric Dumazet
  2014-08-24 19:11     ` David Miller
  2014-08-25 22:21   ` Cong Wang
  2 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2014-08-24 17:37 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

On Sat, 2014-08-23 at 21:38 -0700, David Miller wrote:

> Like Tom Herbert has mentioned we have all the infrastructure (sort
> of) already to handle a list of SKBs going down into
> dev_hard_start_xmit() via the GSO handling.
> 
> But that code is funny, because it keeps the original GSO head SKB
> around as a placeholder to maintain the list of segmented SKBs.

This is because dev_hard_start_xmit() do not return an skb, but a rc
code.

int ret = dev_hard_start_xmit(skb, dev, txq);

That can be changed so that we do not have to keep the original skb as
the GSO head :

sch_direct_xmit() would 'requeue' the new skb (might be in the middle of
the skb segmented chain), not the original one.

This would get rid of DEV_GSO_CB() presumably.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-24 14:57   ` Jamal Hadi Salim
@ 2014-08-24 19:08     ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2014-08-24 19:08 UTC (permalink / raw)
  To: jhs; +Cc: netdev, therbert, hannes, edumazet, jeffrey.t.kirsher, rusty

From: Jamal Hadi Salim <jhs@mojatatu.com>
Date: Sun, 24 Aug 2014 10:57:32 -0400

> Clearly, the best thing to do is avoid altogether the need to have
> such a list. The only reason you need a list around is because you
> dont know how much you can send to the driver. tx_win idea may not
> be the best but it tried to address that issue.

I think it is unavoidable for us to have to do a "send and recheck
queue state" approach, because of the issues I brought up the other
day.

Every driver has it's own set of restrictions and how many TX slots
a given SKB can consume in it's TX queue.

So from that perspective a list we perform retries on is an approach
we must consider seriously.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-24 17:37   ` Eric Dumazet
@ 2014-08-24 19:11     ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2014-08-24 19:11 UTC (permalink / raw)
  To: eric.dumazet
  Cc: netdev, therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 24 Aug 2014 10:37:48 -0700

> On Sat, 2014-08-23 at 21:38 -0700, David Miller wrote:
> 
>> Like Tom Herbert has mentioned we have all the infrastructure (sort
>> of) already to handle a list of SKBs going down into
>> dev_hard_start_xmit() via the GSO handling.
>> 
>> But that code is funny, because it keeps the original GSO head SKB
>> around as a placeholder to maintain the list of segmented SKBs.
> 
> This is because dev_hard_start_xmit() do not return an skb, but a rc
> code.
> 
> int ret = dev_hard_start_xmit(skb, dev, txq);
> 
> That can be changed so that we do not have to keep the original skb as
> the GSO head :
> 
> sch_direct_xmit() would 'requeue' the new skb (might be in the middle of
> the skb segmented chain), not the original one.
> 
> This would get rid of DEV_GSO_CB() presumably.

I considered this, but then there is the ugly issue of how to return
the netdev_tx_t code back into the call chain.

I hate passing things back by reference, it's always a symptom of bad
interface design :-/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-23 20:28 [PATCH 0/3] Basic deferred TX queue flushing infrastructure David Miller
                   ` (2 preceding siblings ...)
  2014-08-24  4:38 ` David Miller
@ 2014-08-25  6:10 ` David Miller
  3 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2014-08-25  6:10 UTC (permalink / raw)
  To: netdev; +Cc: therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty


In the interest of having something to readily build upon, I pushed
my changes into net-next with the bug fixed that Alexander Duyck
pointed out.

Don't panic, if this turns out to be the wrong interface we'll adjust
it or just completely revert my changes.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-24  4:38 ` David Miller
  2014-08-24 14:57   ` Jamal Hadi Salim
  2014-08-24 17:37   ` Eric Dumazet
@ 2014-08-25 22:21   ` Cong Wang
  2014-08-25 22:31     ` David Miller
  2 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2014-08-25 22:21 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Tom Herbert, Jamal Hadi Salim, Hannes Frederic Sowa,
	Eric Dumazet, jeffrey.t.kirsher, rusty

On Sat, Aug 23, 2014 at 9:38 PM, David Miller <davem@davemloft.net> wrote:
>
> So today I was looking more into this aspect.
>
> Like Tom Herbert has mentioned we have all the infrastructure (sort
> of) already to handle a list of SKBs going down into
> dev_hard_start_xmit() via the GSO handling.
>
> But that code is funny, because it keeps the original GSO head SKB
> around as a placeholder to maintain the list of segmented SKBs.
>
> So the list walker basically walks starting at skb->next.  That's
> awkward for what we want to do, which is pass in an arbitrary list of
> SKBs.
>
> All it really wants that head SKB for is essentially list management,
> which seems like overkill to me.
>
> Anyways, this got me thinking that we should have something that
> provides the segment list management and stop keeping that head GSO
> SKB around.
>
> Then we can make that "gso:" label list walker generic enough that we
> could pass down arbitrary lists of SKBs from the qdisc_restart() path.
>
> This list management seems to be the only reason why we keep the GSO
> head SKB around after dev_gso_segment(), we should be able to free it
> up early without any problems right?
>
> I'm also thinking about whether we should hang the generic SKB list
> management off of the txq or the qdisc.  Right now the gso_skb thing
> is in the qdisc.
>
> Thoughts?


When I tried to unify the list management of SKB's, I was surprised to see
there are still some places relying on skb->next and skb->prev to be
the head of the skb struct, since nowadays we have list API's, they still
play some magic on these pointers (sctp and tipc IIRC). This is why I
gave up, maybe it's time to revise this again.

Talking about skb->next, fortunately we do gso segmentation after
going out of qdisc queues, otherwise it's scary to play with these
pointers at same time. I think all queues of SKB's are either using
just ->next or both ->prev and ->next.

Just my two cents.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-25 22:21   ` Cong Wang
@ 2014-08-25 22:31     ` David Miller
  2014-08-25 22:37       ` Eric Dumazet
                         ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: David Miller @ 2014-08-25 22:31 UTC (permalink / raw)
  To: cwang; +Cc: netdev, therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

From: Cong Wang <cwang@twopensource.com>
Date: Mon, 25 Aug 2014 15:21:00 -0700

> When I tried to unify the list management of SKB's, I was surprised to see
> there are still some places relying on skb->next and skb->prev to be
> the head of the skb struct, since nowadays we have list API's, they still
> play some magic on these pointers (sctp and tipc IIRC). This is why I
> gave up, maybe it's time to revise this again.

I think SCTP should be OK, and yes I do remember that protocol being one of
the last subsystems making such SKB list pointer assumptions.

It was using list_*() operations on sk_buff objects or something like that.

> Talking about skb->next, fortunately we do gso segmentation after
> going out of qdisc queues, otherwise it's scary to play with these
> pointers at same time. I think all queues of SKB's are either using
> just ->next or both ->prev and ->next.

It occurs to me that perhaps the thing to do is to pass sk_buff ** to
dev_hard_start_xmit().

If it really is important to free the original GSO skb after the
segmented parts, we can run that as part of the destructor of the
final segment.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-25 22:31     ` David Miller
@ 2014-08-25 22:37       ` Eric Dumazet
  2014-08-25 22:41         ` David Miller
  2014-08-25 22:45       ` Jon Maloy
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2014-08-25 22:37 UTC (permalink / raw)
  To: David Miller
  Cc: cwang, netdev, therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

On Mon, 2014-08-25 at 15:31 -0700, David Miller wrote:

> It occurs to me that perhaps the thing to do is to pass sk_buff ** to
> dev_hard_start_xmit().

This was my suggestion ;)

int ret = dev_hard_start_xmit(skb, dev, txq);

->

int ret = dev_hard_start_xmit(&skb, dev, txq);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-25 22:37       ` Eric Dumazet
@ 2014-08-25 22:41         ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2014-08-25 22:41 UTC (permalink / raw)
  To: eric.dumazet
  Cc: cwang, netdev, therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Aug 2014 15:37:54 -0700

> On Mon, 2014-08-25 at 15:31 -0700, David Miller wrote:
> 
>> It occurs to me that perhaps the thing to do is to pass sk_buff ** to
>> dev_hard_start_xmit().
> 
> This was my suggestion ;)
> 
> int ret = dev_hard_start_xmit(skb, dev, txq);
> 
> ->
> 
> int ret = dev_hard_start_xmit(&skb, dev, txq);

Ok and this can solve the GSO freeing issue too, if the caller was working
with a segmented SKB, he can pass &skb->next to dev_hard_start_xmit() then
if skb->next is NULL after dev_hard_start_xmit() returns we can free up
the head GSO skb.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-25 22:31     ` David Miller
  2014-08-25 22:37       ` Eric Dumazet
@ 2014-08-25 22:45       ` Jon Maloy
  2014-08-25 23:24       ` Vlad Yasevich
  2014-09-01  7:40       ` Jesper Dangaard Brouer
  3 siblings, 0 replies; 19+ messages in thread
From: Jon Maloy @ 2014-08-25 22:45 UTC (permalink / raw)
  To: David Miller, cwang
  Cc: netdev, therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of David Miller
> Sent: August-25-14 6:32 PM
> To: cwang@twopensource.com
> Cc: netdev@vger.kernel.org; therbert@google.com; jhs@mojatatu.com;
> hannes@stressinduktion.org; edumazet@google.com;
> jeffrey.t.kirsher@intel.com; rusty@rustcorp.com.au
> Subject: Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
> 
> From: Cong Wang <cwang@twopensource.com>
> Date: Mon, 25 Aug 2014 15:21:00 -0700
> 
> > When I tried to unify the list management of SKB's, I was surprised to
> > see there are still some places relying on skb->next and skb->prev to
> > be the head of the skb struct, since nowadays we have list API's, they
> > still play some magic on these pointers (sctp and tipc IIRC). This is

I am not aware of any such assumptions in TIPC.  For me you can go ahead
with this.

Regards
///jon


> > why I gave up, maybe it's time to revise this again.
> 
> I think SCTP should be OK, and yes I do remember that protocol being one of
> the last subsystems making such SKB list pointer assumptions.
> 
> It was using list_*() operations on sk_buff objects or something like that.
> 
> > Talking about skb->next, fortunately we do gso segmentation after
> > going out of qdisc queues, otherwise it's scary to play with these
> > pointers at same time. I think all queues of SKB's are either using
> > just ->next or both ->prev and ->next.
> 
> It occurs to me that perhaps the thing to do is to pass sk_buff ** to
> dev_hard_start_xmit().
> 
> If it really is important to free the original GSO skb after the segmented parts,
> we can run that as part of the destructor of the final segment.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-25 22:31     ` David Miller
  2014-08-25 22:37       ` Eric Dumazet
  2014-08-25 22:45       ` Jon Maloy
@ 2014-08-25 23:24       ` Vlad Yasevich
  2014-09-01  7:40       ` Jesper Dangaard Brouer
  3 siblings, 0 replies; 19+ messages in thread
From: Vlad Yasevich @ 2014-08-25 23:24 UTC (permalink / raw)
  To: David Miller, cwang
  Cc: netdev, therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

On 08/25/2014 06:31 PM, David Miller wrote:
> From: Cong Wang <cwang@twopensource.com>
> Date: Mon, 25 Aug 2014 15:21:00 -0700
> 
>> When I tried to unify the list management of SKB's, I was surprised to see
>> there are still some places relying on skb->next and skb->prev to be
>> the head of the skb struct, since nowadays we have list API's, they still
>> play some magic on these pointers (sctp and tipc IIRC). This is why I
>> gave up, maybe it's time to revise this again.
> 
> I think SCTP should be OK, and yes I do remember that protocol being one of
> the last subsystems making such SKB list pointer assumptions.
> 
> It was using list_*() operations on sk_buff objects or something like that.


All I see that's left is __skb_unlink, __skb_queue_tail and skb_queue_splice_tail_init()

I think you've convert all of them a while ago.

-vlad

> 
>> Talking about skb->next, fortunately we do gso segmentation after
>> going out of qdisc queues, otherwise it's scary to play with these
>> pointers at same time. I think all queues of SKB's are either using
>> just ->next or both ->prev and ->next.
> 
> It occurs to me that perhaps the thing to do is to pass sk_buff ** to
> dev_hard_start_xmit().
> 
> If it really is important to free the original GSO skb after the
> segmented parts, we can run that as part of the destructor of the
> final segment.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-08-25 22:31     ` David Miller
                         ` (2 preceding siblings ...)
  2014-08-25 23:24       ` Vlad Yasevich
@ 2014-09-01  7:40       ` Jesper Dangaard Brouer
  2014-09-01 21:40         ` David Miller
  3 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2014-09-01  7:40 UTC (permalink / raw)
  To: David Miller
  Cc: brouer, cwang, netdev, therbert, jhs, hannes, edumazet,
	jeffrey.t.kirsher, rusty


On Mon, 25 Aug 2014 15:31:46 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> It occurs to me that perhaps the thing to do is to pass sk_buff ** to
> dev_hard_start_xmit().

Is this to send an array of skb's to dev_hard_start_xmit()?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure.
  2014-09-01  7:40       ` Jesper Dangaard Brouer
@ 2014-09-01 21:40         ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2014-09-01 21:40 UTC (permalink / raw)
  To: brouer
  Cc: cwang, netdev, therbert, jhs, hannes, edumazet, jeffrey.t.kirsher, rusty

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Mon, 1 Sep 2014 09:40:32 +0200

> 
> On Mon, 25 Aug 2014 15:31:46 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> 
>> It occurs to me that perhaps the thing to do is to pass sk_buff ** to
>> dev_hard_start_xmit().
> 
> Is this to send an array of skb's to dev_hard_start_xmit()?

It's one possible mechanism.

I have a patch set that rearchitects the send path here which I'll
post in a little bit, which takes a slightly different approach
based upon suggestions by Eric Dumazet and Tom Herbert.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-09-01 21:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-23 20:28 [PATCH 0/3] Basic deferred TX queue flushing infrastructure David Miller
2014-08-23 23:25 ` Alexander Duyck
2014-08-24 12:58   ` Jesper Dangaard Brouer
2014-08-24  3:39 ` Tom Herbert
2014-08-24  4:26   ` David Miller
2014-08-24  4:38 ` David Miller
2014-08-24 14:57   ` Jamal Hadi Salim
2014-08-24 19:08     ` David Miller
2014-08-24 17:37   ` Eric Dumazet
2014-08-24 19:11     ` David Miller
2014-08-25 22:21   ` Cong Wang
2014-08-25 22:31     ` David Miller
2014-08-25 22:37       ` Eric Dumazet
2014-08-25 22:41         ` David Miller
2014-08-25 22:45       ` Jon Maloy
2014-08-25 23:24       ` Vlad Yasevich
2014-09-01  7:40       ` Jesper Dangaard Brouer
2014-09-01 21:40         ` David Miller
2014-08-25  6:10 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.