netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/9] bql: Byte Queue Limits
@ 2011-08-08  4:43 Tom Herbert
  2011-08-08 13:06 ` jamal
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Tom Herbert @ 2011-08-08  4:43 UTC (permalink / raw)
  To: davem, netdev

Changes from last version:
- Simplified and generalized driver interface.  Drivers need to
  implement two functions:
    netdev_tx_completed_queue: Called at end of transmit completion
      to inform stack of number of bytes and packets processed.
    netdev_tx_sent_queue: Called to inform stack when packets are
      queued.

    netdev_tx_reset_queue: is optional to reset state in the stack

- Added new per queue flags that allow stack to stop a queue
  separately from driver doing this.  Driver continue using the
  same functions to stop queues, but there are two functions that
  the stack calls (to check if queue has been stopped by driver or
  stack:

  netif_xmit_stopped,netif_xmit_frozen_or_stopped

- Added example support for bnx2x and sfc (demonstrates operation over
  multi-queue)

- Removed BQL being under CONFIG_RPS (didn't add CONFIG_BQL)

- Still needs some more testing, including ishowing benfits to high
  priority packets in QoS.
----

This patch series implements byte queue limits (bql) for NIC TX queues.

Byte queue limits are a mechanism to limit the size of the transmit
hardware queue on a NIC by number of bytes. The goal of these byte
limits is too reduce latency caused by excessive queuing in hardware
without sacrificing throughput.

Hardware queuing limits are typically specified in terms of a number
hardware descriptors, each of which has a variable size. The variability
of the size of individual queued items can have a very wide range. For
instance with the e1000 NIC the size could range from 64 bytes to 4K
(with TSO enabled). This variability makes it next to impossible to
choose a single queue limit that prevents starvation and provides lowest
possible latency.

The objective of byte queue limits is to set the limit to be the
minimum needed to prevent starvation between successive transmissions to
the hardware. The latency between two transmissions can be variable in a
system. It is dependent on interrupt frequency, NAPI polling latencies,
scheduling of the queuing discipline, lock contention, etc. Therefore we
propose that byte queue limits should be dynamic and change in
iaccordance with networking stack latencies a system encounters.

Patches to implement this:
Patch 1: Dynamic queue limits (dql) library.  This provides the general
queuing algorithm.
Patch 2: netdev changes that use dlq to support byte queue limits.
Patch 3: Support in forcedeth drvier for byte queue limits.

The effects of BQL are demonstrated in the benchmark results below.
These were made running 200 stream of netperf RR tests:

140000 rr size
BQL: 80-215K bytes in queue, 856 tps, 3.26%
No BQL: 2700-2930K bytes in queue, 854 tps, 3.71% cpu

14000 rr size
BQ: 25-55K bytes in queue, 8500 tps
No BQL: 1500-1622K bytes in queue,  8523 tps, 4.53% cpu

1400 rr size
BQL: 20-38K in queue bytes in queue, 86582 tps,  7.38% cpu
No BQL: 29-117K 85738 tps, 7.67% cpu

140 rr size
BQL: 1-10K bytes in queue, 320540 tps, 34.6% cpu
No BQL: 1-13K bytes in queue, 323158, 37.16% cpu

1 rr size
BQL: 0-3K in queue, 338811 tps, 41.41% cpu
No BQL: 0-3K in queue, 339947 42.36% cpu

The amount of queuing in the NIC is reduced up to 90%, and I haven't
yet seen a consistent negative impact in terms of throughout or
CPU utilization.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08  4:43 [RFC PATCH v2 0/9] bql: Byte Queue Limits Tom Herbert
@ 2011-08-08 13:06 ` jamal
  2011-08-08 13:18   ` Johannes Berg
  2011-08-25 15:19   ` Tom Herbert
  2011-08-08 15:40 ` Stephen Hemminger
  2011-08-12 18:20 ` Stephen Hemminger
  2 siblings, 2 replies; 19+ messages in thread
From: jamal @ 2011-08-08 13:06 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, Johannes Berg

Hi Tom,

Where are the throughput numbers? I think the tps may be sufficient
proof point for latency.
No comment on the code but on the concept.
Essentially what you are implementing is two things in one:
- byte counting (instead of packet counting we use at that layer)
- pseudo active queue management

For wired connections I think the big deal is in improved
runtime memory saving (your perf numbers are kinda ok).
The challenge is going to be with wireless where the underlying
bandwidth changes (and therefore the optimal queue size varies
more frequently). The problem with active queue management is
getting the feedback loop to be more accurate and i think there
will be challenges with wired devices.
I notice that you dont have any wireless devices;
but it would be nice for someone to check this out on wireless.
CCing Johannes - maybe he has some insight.

cheers,
jamal




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 13:06 ` jamal
@ 2011-08-08 13:18   ` Johannes Berg
  2011-08-25 15:29     ` Tom Herbert
  2011-08-25 15:19   ` Tom Herbert
  1 sibling, 1 reply; 19+ messages in thread
From: Johannes Berg @ 2011-08-08 13:18 UTC (permalink / raw)
  To: jhs; +Cc: Tom Herbert, davem, netdev

Thanks for the Cc Jamal.

On Mon, 2011-08-08 at 09:06 -0400, jamal wrote:

> The challenge is going to be with wireless where the underlying
> bandwidth changes (and therefore the optimal queue size varies
> more frequently). The problem with active queue management is
> getting the feedback loop to be more accurate and i think there
> will be challenges with wired devices.
> I notice that you dont have any wireless devices;
> but it would be nice for someone to check this out on wireless.
> CCing Johannes - maybe he has some insight.

Well, the wireless case is curious, and has a whole bunch of corner
cases, since it's not necessarily PtP, it can be PtMP!

But considering the most basic case of us being a client connecting to
an AP first: yes, the bandwidth will change dynamically, I don't know
what impact this has on BQL, Tom, maybe you can think about this a bit?

The second big challenge in wireless is the PtMP case: if we're acting
as an AP, then we typically have four queues for any number of remote
endpoints with varying bandwidth. I haven't found a good way to handle
this, we can't have hardware queues per station (most HW is simply not
capable of that many queues) but technically we would want to make the
queue limits depend on the peer...

Since I just returned from vacation I have tons of email to dig through
I'll have to keep this short for now, but I'm definitely interested.

johannes


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08  4:43 [RFC PATCH v2 0/9] bql: Byte Queue Limits Tom Herbert
  2011-08-08 13:06 ` jamal
@ 2011-08-08 15:40 ` Stephen Hemminger
  2011-08-08 17:51   ` Tom Herbert
  2011-08-12 18:20 ` Stephen Hemminger
  2 siblings, 1 reply; 19+ messages in thread
From: Stephen Hemminger @ 2011-08-08 15:40 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Sun, 7 Aug 2011 21:43:13 -0700 (PDT)
Tom Herbert <therbert@google.com> wrote:

>     netdev_tx_completed_queue: Called at end of transmit completion
>       to inform stack of number of bytes and packets processed.
>     netdev_tx_sent_queue: Called to inform stack when packets are
>       queued.

Couldn't these be done for the device in the existing qdisc infra
structure (or dev_start_xmit). Alternatively, rename ndo_start_xmit
to something else and make all the callers use the wrapper.

Changing all the drivers for something that the driver has no real
need to care about seems like incorrect object design.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 15:40 ` Stephen Hemminger
@ 2011-08-08 17:51   ` Tom Herbert
  2011-08-08 17:55     ` Stephen Hemminger
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2011-08-08 17:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev

On Mon, Aug 8, 2011 at 8:40 AM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Sun, 7 Aug 2011 21:43:13 -0700 (PDT)
> Tom Herbert <therbert@google.com> wrote:
>
>>     netdev_tx_completed_queue: Called at end of transmit completion
>>       to inform stack of number of bytes and packets processed.
>>     netdev_tx_sent_queue: Called to inform stack when packets are
>>       queued.
>
> Couldn't these be done for the device in the existing qdisc infra
> structure (or dev_start_xmit). Alternatively, rename ndo_start_xmit
> to something else and make all the callers use the wrapper.
>
> Changing all the drivers for something that the driver has no real
> need to care about seems like incorrect object design.
>
The netdev_tx_completed_queue is needed to inform the stack of number
of packets and bytes completed in an execution of transmit completion
(epic).  I don't see a way to get that information outside of the
driver.

Tom

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 17:51   ` Tom Herbert
@ 2011-08-08 17:55     ` Stephen Hemminger
  2011-08-08 17:56       ` Tom Herbert
  2011-08-08 18:01       ` Tom Herbert
  0 siblings, 2 replies; 19+ messages in thread
From: Stephen Hemminger @ 2011-08-08 17:55 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Mon, 8 Aug 2011 10:51:06 -0700
Tom Herbert <therbert@google.com> wrote:

> On Mon, Aug 8, 2011 at 8:40 AM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > On Sun, 7 Aug 2011 21:43:13 -0700 (PDT)
> > Tom Herbert <therbert@google.com> wrote:
> >
> >>     netdev_tx_completed_queue: Called at end of transmit completion
> >>       to inform stack of number of bytes and packets processed.
> >>     netdev_tx_sent_queue: Called to inform stack when packets are
> >>       queued.
> >
> > Couldn't these be done for the device in the existing qdisc infra
> > structure (or dev_start_xmit). Alternatively, rename ndo_start_xmit
> > to something else and make all the callers use the wrapper.
> >
> > Changing all the drivers for something that the driver has no real
> > need to care about seems like incorrect object design.
> >
> The netdev_tx_completed_queue is needed to inform the stack of number
> of packets and bytes completed in an execution of transmit completion
> (epic).  I don't see a way to get that information outside of the
> driver.
> 
> Tom

Since transmit completion means calling dev_kfree_skb() why not account
there? You could add some info to netdev if necessary to get compile
the statistics.

I just hate driver api complexity growth.
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 17:55     ` Stephen Hemminger
@ 2011-08-08 17:56       ` Tom Herbert
  2011-08-08 18:01       ` Tom Herbert
  1 sibling, 0 replies; 19+ messages in thread
From: Tom Herbert @ 2011-08-08 17:56 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev

On Mon, Aug 8, 2011 at 10:55 AM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Mon, 8 Aug 2011 10:51:06 -0700
> Tom Herbert <therbert@google.com> wrote:
>
>> On Mon, Aug 8, 2011 at 8:40 AM, Stephen Hemminger <shemminger@vyatta.com> wrote:
>> > On Sun, 7 Aug 2011 21:43:13 -0700 (PDT)
>> > Tom Herbert <therbert@google.com> wrote:
>> >
>> >>     netdev_tx_completed_queue: Called at end of transmit completion
>> >>       to inform stack of number of bytes and packets processed.
>> >>     netdev_tx_sent_queue: Called to inform stack when packets are
>> >>       queued.
>> >
>> > Couldn't these be done for the device in the existing qdisc infra
>> > structure (or dev_start_xmit). Alternatively, rename ndo_start_xmit
>> > to something else and make all the callers use the wrapper.
>> >
>> > Changing all the drivers for something that the driver has no real
>> > need to care about seems like incorrect object design.
>> >
>> The netdev_tx_completed_queue is needed to inform the stack of number
>> of packets and bytes completed in an execution of transmit completion
>> (epic).  I don't see a way to get that information outside of the
>> driver.
>>
>> Tom
>
> Since transmit completion means calling dev_kfree_skb() why not account
> there? You could add some info to netdev if necessary to get compile
> the statistics.
>
> I just hate driver api complexity growth.
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 17:55     ` Stephen Hemminger
  2011-08-08 17:56       ` Tom Herbert
@ 2011-08-08 18:01       ` Tom Herbert
  2011-08-08 18:19         ` Stephen Hemminger
  1 sibling, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2011-08-08 18:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev

> Since transmit completion means calling dev_kfree_skb() why not account
> there? You could add some info to netdev if necessary to get compile
> the statistics.
>
The algorithm depends on knowing the total number of packets competed
in a single execution of transmit completion (epic based).  We only
want to recalculate the limits once per completion, which happens when
the completion function is called.

> I just hate driver api complexity growth.
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 18:01       ` Tom Herbert
@ 2011-08-08 18:19         ` Stephen Hemminger
  2011-08-09  7:41           ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Hemminger @ 2011-08-08 18:19 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Mon, 8 Aug 2011 11:01:57 -0700
Tom Herbert <therbert@google.com> wrote:

> > Since transmit completion means calling dev_kfree_skb() why not account
> > there? You could add some info to netdev if necessary to get compile
> > the statistics.
> >
> The algorithm depends on knowing the total number of packets competed
> in a single execution of transmit completion (epic based).  We only
> want to recalculate the limits once per completion, which happens when
> the completion function is called.

So just add some stats to netdev and count the number of dev_kfree_skb
calls and do your work at napi complete.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 18:19         ` Stephen Hemminger
@ 2011-08-09  7:41           ` David Miller
  2011-08-09  8:06             ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2011-08-09  7:41 UTC (permalink / raw)
  To: shemminger; +Cc: therbert, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 8 Aug 2011 11:19:17 -0700

> On Mon, 8 Aug 2011 11:01:57 -0700
> Tom Herbert <therbert@google.com> wrote:
> 
>> > Since transmit completion means calling dev_kfree_skb() why not account
>> > there? You could add some info to netdev if necessary to get compile
>> > the statistics.
>> >
>> The algorithm depends on knowing the total number of packets competed
>> in a single execution of transmit completion (epic based).  We only
>> want to recalculate the limits once per completion, which happens when
>> the completion function is called.
> 
> So just add some stats to netdev and count the number of dev_kfree_skb
> calls and do your work at napi complete.

No, I think what Tom's doing here is reasonable.

First off, drivers have never been required to do specific things if
they return success for ->ndo_start_xmit().  They can kfree_skb(skb)
and return NET_TX_SUCCESS, and that's fine.

This mean we can't use ->ndo_start_xmit() return values to drive the
BQL engine.

So we simply have a way for the driver to say when it really does
queue up a packet, and that might be useful for other things.

Second, you absolutely do not want to run the BQL engine on every
single dev_kfree_skb() call.  You want to batch it, and that's what
Tom's new interface does, and that is therefore quite reasonable
too.

In fact, I think the best part about this revision of the BQL patches
is the device driver interface. :-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-09  7:41           ` David Miller
@ 2011-08-09  8:06             ` Eric Dumazet
  2011-08-09 14:54               ` Stephen Hemminger
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2011-08-09  8:06 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, therbert, netdev

Le mardi 09 août 2011 à 00:41 -0700, David Miller a écrit :
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Mon, 8 Aug 2011 11:19:17 -0700
> 
> > On Mon, 8 Aug 2011 11:01:57 -0700
> > Tom Herbert <therbert@google.com> wrote:
> > 
> >> > Since transmit completion means calling dev_kfree_skb() why not account
> >> > there? You could add some info to netdev if necessary to get compile
> >> > the statistics.
> >> >
> >> The algorithm depends on knowing the total number of packets competed
> >> in a single execution of transmit completion (epic based).  We only
> >> want to recalculate the limits once per completion, which happens when
> >> the completion function is called.
> > 
> > So just add some stats to netdev and count the number of dev_kfree_skb
> > calls and do your work at napi complete.
> 
> No, I think what Tom's doing here is reasonable.
> 
> First off, drivers have never been required to do specific things if
> they return success for ->ndo_start_xmit().  They can kfree_skb(skb)
> and return NET_TX_SUCCESS, and that's fine.
> 
> This mean we can't use ->ndo_start_xmit() return values to drive the
> BQL engine.
> 
> So we simply have a way for the driver to say when it really does
> queue up a packet, and that might be useful for other things.
> 
> Second, you absolutely do not want to run the BQL engine on every
> single dev_kfree_skb() call.  You want to batch it, and that's what
> Tom's new interface does, and that is therefore quite reasonable
> too.
> 
> In fact, I think the best part about this revision of the BQL patches
> is the device driver interface. :-)
> 

I agree, it permits a smooth transition to selected drivers, even some
non NAPI ones ;)

In some stress situations, NAPI never completes, so it would also mean
adding a dql call in this case...




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-09  8:06             ` Eric Dumazet
@ 2011-08-09 14:54               ` Stephen Hemminger
  2011-08-09 18:28                 ` Tom Herbert
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Hemminger @ 2011-08-09 14:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, therbert, netdev

On Tue, 09 Aug 2011 10:06:13 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mardi 09 août 2011 à 00:41 -0700, David Miller a écrit :
> > From: Stephen Hemminger <shemminger@vyatta.com>
> > Date: Mon, 8 Aug 2011 11:19:17 -0700
> > 
> > > On Mon, 8 Aug 2011 11:01:57 -0700
> > > Tom Herbert <therbert@google.com> wrote:
> > > 
> > >> > Since transmit completion means calling dev_kfree_skb() why not account
> > >> > there? You could add some info to netdev if necessary to get compile
> > >> > the statistics.
> > >> >
> > >> The algorithm depends on knowing the total number of packets competed
> > >> in a single execution of transmit completion (epic based).  We only
> > >> want to recalculate the limits once per completion, which happens when
> > >> the completion function is called.
> > > 
> > > So just add some stats to netdev and count the number of dev_kfree_skb
> > > calls and do your work at napi complete.
> > 
> > No, I think what Tom's doing here is reasonable.
> > 
> > First off, drivers have never been required to do specific things if
> > they return success for ->ndo_start_xmit().  They can kfree_skb(skb)
> > and return NET_TX_SUCCESS, and that's fine.
> > 
> > This mean we can't use ->ndo_start_xmit() return values to drive the
> > BQL engine.
> > 
> > So we simply have a way for the driver to say when it really does
> > queue up a packet, and that might be useful for other things.
> > 
> > Second, you absolutely do not want to run the BQL engine on every
> > single dev_kfree_skb() call.  You want to batch it, and that's what
> > Tom's new interface does, and that is therefore quite reasonable
> > too.
> > 
> > In fact, I think the best part about this revision of the BQL patches
> > is the device driver interface. :-)
> > 
> 
> I agree, it permits a smooth transition to selected drivers, even some
> non NAPI ones ;)
> 
> In some stress situations, NAPI never completes, so it would also mean
> adding a dql call in this case...

Ok. then we need to fix all NAPI drivers at once, and get more
drivers converted over to NAPI. It is really annoying how there
is just too much variation among network drivers.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-09 14:54               ` Stephen Hemminger
@ 2011-08-09 18:28                 ` Tom Herbert
  0 siblings, 0 replies; 19+ messages in thread
From: Tom Herbert @ 2011-08-09 18:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, David Miller, netdev

> Ok. then we need to fix all NAPI drivers at once, and get more
> drivers converted over to NAPI. It is really annoying how there
> is just too much variation among network drivers.

Agreed this variation has created a mess and makes it difficult to add
this sort of functionality.  But I'm not sure what the "fix" would be.

For instance...

It seems like there are three primary points points where drivers
interact with stack to stop or wakeup queue:

1) On start-xmit, availability is checked and queue is stopped if no
space available queue is stopped.  Some drivers report an error if
there was no availability and queue had not been stopped as expected.
2) At the end of start-xmit availability is checked and queue is
stopped if necessary.
3) At end of completion processing, availability is checked and queue
is waked up if necessary.

Each of these are several lines of code in each driver, and seem to be
begging for a unified abstraction.  For instance in tg3 instead of:

if (unlikely(netif_tx_queue_stopped(txq) &&
                     (tg3_tx_avail(tnapi) > TG3_TX_WAKEUP_THRESH(tnapi)))) {
                __netif_tx_lock(txq, smp_processor_id());
                if (netif_tx_queue_stopped(txq) &&
                    (tg3_tx_avail(tnapi) > TG3_TX_WAKEUP_THRESH(tnapi)))
                        netif_tx_wake_queue(txq);
                __netif_tx_unlock(txq);
        }

we probably would want something like:

netif_tx_queue_complete_check(txq, pkts, bytes, tg3_tx_avail(tnapi) >
TG3_TX_WAKEUP_THRESH(tnapi))

which would do all the work to check for available space, wakeup the
queue, statistic, bql, etc...

I actually did consider abstracting these out as part of bql, but even
though drivers implemented these in similar ways, variations in each
driver might make introducing a unified abstraction nontrivial (like
where drivers need to insert memory barriers and variations in
locking).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08  4:43 [RFC PATCH v2 0/9] bql: Byte Queue Limits Tom Herbert
  2011-08-08 13:06 ` jamal
  2011-08-08 15:40 ` Stephen Hemminger
@ 2011-08-12 18:20 ` Stephen Hemminger
  2011-08-13  0:33   ` Tom Herbert
  2 siblings, 1 reply; 19+ messages in thread
From: Stephen Hemminger @ 2011-08-12 18:20 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Sun, 7 Aug 2011 21:43:13 -0700 (PDT)
Tom Herbert <therbert@google.com> wrote:

> Changes from last version:
> - Simplified and generalized driver interface.  Drivers need to
>   implement two functions:
>     netdev_tx_completed_queue: Called at end of transmit completion
>       to inform stack of number of bytes and packets processed.
>     netdev_tx_sent_queue: Called to inform stack when packets are
>       queued.
> 
>     netdev_tx_reset_queue: is optional to reset state in the stack
> 
> - Added new per queue flags that allow stack to stop a queue
>   separately from driver doing this.  Driver continue using the
>   same functions to stop queues, but there are two functions that
>   the stack calls (to check if queue has been stopped by driver or
>   stack:
> 
>   netif_xmit_stopped,netif_xmit_frozen_or_stopped
> 
> - Added example support for bnx2x and sfc (demonstrates operation over
>   multi-queue)
> 
> - Removed BQL being under CONFIG_RPS (didn't add CONFIG_BQL)
> 
> - Still needs some more testing, including ishowing benfits to high
>   priority packets in QoS.
> ----
> 
> This patch series implements byte queue limits (bql) for NIC TX queues.
> 
> Byte queue limits are a mechanism to limit the size of the transmit
> hardware queue on a NIC by number of bytes. The goal of these byte
> limits is too reduce latency caused by excessive queuing in hardware
> without sacrificing throughput.
> 
> Hardware queuing limits are typically specified in terms of a number
> hardware descriptors, each of which has a variable size. The variability
> of the size of individual queued items can have a very wide range. For
> instance with the e1000 NIC the size could range from 64 bytes to 4K
> (with TSO enabled). This variability makes it next to impossible to
> choose a single queue limit that prevents starvation and provides lowest
> possible latency.
> 
> The objective of byte queue limits is to set the limit to be the
> minimum needed to prevent starvation between successive transmissions to
> the hardware. The latency between two transmissions can be variable in a
> system. It is dependent on interrupt frequency, NAPI polling latencies,
> scheduling of the queuing discipline, lock contention, etc. Therefore we
> propose that byte queue limits should be dynamic and change in
> iaccordance with networking stack latencies a system encounters.

A couple of other related points:

1. Need a way to configure the values.  My preference would be new
   netlink attribute and via 'ip link set' ...

2. Does this need to know the speed of the link (autonegotiation) or
   is it self clocking?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-12 18:20 ` Stephen Hemminger
@ 2011-08-13  0:33   ` Tom Herbert
  2011-08-13  0:38     ` Stephen Hemminger
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2011-08-13  0:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev

> 1. Need a way to configure the values.  My preference would be new
>   netlink attribute and via 'ip link set' ...
>
As opposed to using sysfs entries?

> 2. Does this need to know the speed of the link (autonegotiation) or

No.

>   is it self clocking?

It is self clocking.  The only time related tunable is hold_time
(which I think can have a reasonable default).

Tom

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-13  0:33   ` Tom Herbert
@ 2011-08-13  0:38     ` Stephen Hemminger
  0 siblings, 0 replies; 19+ messages in thread
From: Stephen Hemminger @ 2011-08-13  0:38 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Fri, 12 Aug 2011 17:33:24 -0700
Tom Herbert <therbert@google.com> wrote:

> > 1. Need a way to configure the values.  My preference would be new
> >   netlink attribute and via 'ip link set' ...
> >
> As opposed to using sysfs entries?

Both are useful. But netlink is more consistent and has the
ability to be used as a transport.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 13:06 ` jamal
  2011-08-08 13:18   ` Johannes Berg
@ 2011-08-25 15:19   ` Tom Herbert
  1 sibling, 0 replies; 19+ messages in thread
From: Tom Herbert @ 2011-08-25 15:19 UTC (permalink / raw)
  To: jhs; +Cc: davem, netdev, Johannes Berg

> For wired connections I think the big deal is in improved
> runtime memory saving (your perf numbers are kinda ok).
> The challenge is going to be with wireless where the underlying
> bandwidth changes (and therefore the optimal queue size varies
> more frequently). The problem with active queue management is
> getting the feedback loop to be more accurate and i think there
> will be challenges with wired devices.

The important characteristic (for us at least) will be reduced latency
for high priority packets (for NICs that don't support qos multiQ.  I
do have data showing those benefits, but it's a little old.  I will
have something to present at LPC.

> I notice that you dont have any wireless devices;
> but it would be nice for someone to check this out on wireless.
> CCing Johannes - maybe he has some insight.
>
Yeah, these scare me ;-)

> cheers,
> jamal
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-08 13:18   ` Johannes Berg
@ 2011-08-25 15:29     ` Tom Herbert
  2011-08-25 20:23       ` jamal
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2011-08-25 15:29 UTC (permalink / raw)
  To: Johannes Berg; +Cc: jhs, davem, netdev

> Well, the wireless case is curious, and has a whole bunch of corner
> cases, since it's not necessarily PtP, it can be PtMP!
>
> But considering the most basic case of us being a client connecting to
> an AP first: yes, the bandwidth will change dynamically, I don't know
> what impact this has on BQL, Tom, maybe you can think about this a bit?
>
BQL is dynamic, and will increase the queue limit more aggressively
than decrease it.  So for instance, we can track the largest queue
needed over 30 seconds which should be stable in the presence even in
the presence of fluctuating bandwidth.  The thing that worries me is
rather the HW queues conform to the queue characteristics described in
the patch.  If transmit completions are random and not regular, BQL
probably can't function well.

If you'd like to bring this up on some wireless devices that would be
great, I don't have easy access to any right now, but I can try to
help otherwise.


> The second big challenge in wireless is the PtMP case: if we're acting
> as an AP, then we typically have four queues for any number of remote
> endpoints with varying bandwidth. I haven't found a good way to handle
> this, we can't have hardware queues per station (most HW is simply not
> capable of that many queues) but technically we would want to make the
> queue limits depend on the peer...
>
> Since I just returned from vacation I have tons of email to dig through
> I'll have to keep this short for now, but I'm definitely interested.
>
> johannes
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH v2 0/9] bql: Byte Queue Limits
  2011-08-25 15:29     ` Tom Herbert
@ 2011-08-25 20:23       ` jamal
  0 siblings, 0 replies; 19+ messages in thread
From: jamal @ 2011-08-25 20:23 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Johannes Berg, davem, netdev

On Thu, 2011-08-25 at 08:29 -0700, Tom Herbert wrote:

> BQL is dynamic, and will increase the queue limit more aggressively
> than decrease it.  So for instance, we can track the largest queue
> needed over 30 seconds which should be stable in the presence even in
> the presence of fluctuating bandwidth.  The thing that worries me is
> rather the HW queues conform to the queue characteristics described in
> the patch.  If transmit completions are random and not regular, BQL
> probably can't function well.
> 

I think thats the challenge ;-> I wouldnt say it is random, but if my
understanding is correct the effect is a factor of number of stations
etc.

> If you'd like to bring this up on some wireless devices that would be
> great, I don't have easy access to any right now, but I can try to
> help otherwise.

I am most curious as well...

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-08-25 20:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-08  4:43 [RFC PATCH v2 0/9] bql: Byte Queue Limits Tom Herbert
2011-08-08 13:06 ` jamal
2011-08-08 13:18   ` Johannes Berg
2011-08-25 15:29     ` Tom Herbert
2011-08-25 20:23       ` jamal
2011-08-25 15:19   ` Tom Herbert
2011-08-08 15:40 ` Stephen Hemminger
2011-08-08 17:51   ` Tom Herbert
2011-08-08 17:55     ` Stephen Hemminger
2011-08-08 17:56       ` Tom Herbert
2011-08-08 18:01       ` Tom Herbert
2011-08-08 18:19         ` Stephen Hemminger
2011-08-09  7:41           ` David Miller
2011-08-09  8:06             ` Eric Dumazet
2011-08-09 14:54               ` Stephen Hemminger
2011-08-09 18:28                 ` Tom Herbert
2011-08-12 18:20 ` Stephen Hemminger
2011-08-13  0:33   ` Tom Herbert
2011-08-13  0:38     ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).