All of lore.kernel.org
 help / color / mirror / Atom feed
* On Linux rate limiting and the magic value of 34.64 Gbps...
@ 2011-03-25  7:14 Maciej Żenczykowski
  2011-03-25  7:17 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Maciej Żenczykowski @ 2011-03-25  7:14 UTC (permalink / raw)
  To: Linux NetDev; +Cc: Eric Dumazet, David Miller

Hey,

The Linux rate limiting code relies on the rate field of struct tc_ratespec.
This field is a __u32 and measures rate in "bytes per second".

This basically means maximum representable rate is 4GB per second.
This is equivalent to 34.36 Gbps and I just ran across that limit with
40 Gbps (which behaves like 5.64 Gbps because of overflow/truncation).
Seeing as this structure is exposed to userspace for both read and
write via various netlink paths (in cbq, htb, tbf, etc...) there
doesn't seem to be a particularly clean way to increase the size of
this field.  While there is a __reserved field that could
theoretically be repurposed as some sort of rate bit shift register, I
don't think we can rely on __reserved having been set to zero by
userspace (by older programs), and we will definitely see problems
with output by programs (tc) that don't expect to have to parse this
field to output an understandable rate limit...

Anybody have any bright ideas?

Thanks,
Maciej

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: On Linux rate limiting and the magic value of 34.64 Gbps...
  2011-03-25  7:14 On Linux rate limiting and the magic value of 34.64 Gbps Maciej Żenczykowski
@ 2011-03-25  7:17 ` Eric Dumazet
  2011-03-28  4:43   ` Maciej Żenczykowski
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2011-03-25  7:17 UTC (permalink / raw)
  To: Maciej Żenczykowski; +Cc: Linux NetDev, David Miller

Le vendredi 25 mars 2011 à 00:14 -0700, Maciej Żenczykowski a écrit :
> Hey,
> 
> The Linux rate limiting code relies on the rate field of struct tc_ratespec.
> This field is a __u32 and measures rate in "bytes per second".
> 
> This basically means maximum representable rate is 4GB per second.
> This is equivalent to 34.36 Gbps and I just ran across that limit with
> 40 Gbps (which behaves like 5.64 Gbps because of overflow/truncation).
> Seeing as this structure is exposed to userspace for both read and
> write via various netlink paths (in cbq, htb, tbf, etc...) there
> doesn't seem to be a particularly clean way to increase the size of
> this field.  While there is a __reserved field that could
> theoretically be repurposed as some sort of rate bit shift register, I
> don't think we can rely on __reserved having been set to zero by
> userspace (by older programs), and we will definitely see problems
> with output by programs (tc) that don't expect to have to parse this
> field to output an understandable rate limit...
> 
> Anybody have any bright ideas?

Well, netlink is extensible, so we can easily add a new structure, with
64bit fields if necessary.

We did that for 64bit stats already.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: On Linux rate limiting and the magic value of 34.64 Gbps...
  2011-03-25  7:17 ` Eric Dumazet
@ 2011-03-28  4:43   ` Maciej Żenczykowski
  0 siblings, 0 replies; 3+ messages in thread
From: Maciej Żenczykowski @ 2011-03-28  4:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux NetDev, David Miller

2011/3/25 Eric Dumazet <eric.dumazet@gmail.com>:
> Le vendredi 25 mars 2011 à 00:14 -0700, Maciej Żenczykowski a écrit :
>> Hey,
>>
>> The Linux rate limiting code relies on the rate field of struct tc_ratespec.
>> This field is a __u32 and measures rate in "bytes per second".
>>
>> This basically means maximum representable rate is 4GB per second.
>> This is equivalent to 34.36 Gbps and I just ran across that limit with
>> 40 Gbps (which behaves like 5.64 Gbps because of overflow/truncation).
>> Seeing as this structure is exposed to userspace for both read and
>> write via various netlink paths (in cbq, htb, tbf, etc...) there
>> doesn't seem to be a particularly clean way to increase the size of
>> this field.  While there is a __reserved field that could
>> theoretically be repurposed as some sort of rate bit shift register, I
>> don't think we can rely on __reserved having been set to zero by
>> userspace (by older programs), and we will definitely see problems
>> with output by programs (tc) that don't expect to have to parse this
>> field to output an understandable rate limit...
>>
>> Anybody have any bright ideas?
>
> Well, netlink is extensible, so we can easily add a new structure, with
> 64bit fields if necessary.
>
> We did that for 64bit stats already.

I assume you are referring to:

commit 10708f37ae729baba9b67bd134c3720709d4ae62
Author: Jan Engelhardt <jengelh@medozas.de>
Date:   Thu Mar 11 09:57:29 2010 +0000

    net: core: add IFLA_STATS64 support

    `ip -s link` shows interface counters truncated to 32 bit. This is
    because interface statistics are transported only in 32-bit quantity
    to userspace. This commit adds a new IFLA_STATS64 attribute that
    exports them in full 64 bit.

    References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.html
    Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>

include/linux/if_link.h
net/core/rtnetlink.c

That commit is relatively simple, since the statistics structure is
only ever exported by the kernel and is only exported in one location.

Here the situation is significantly more complex, we both export and
import this structure from userspace.  And we do so in many different
locations.

((v2.6.38))$ egrep -r 'qdisc_(get|put)_rtab' . | wc -l
36

It looks like the amount of backward compatibility code would be have
to be quite large.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-03-28  4:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-25  7:14 On Linux rate limiting and the magic value of 34.64 Gbps Maciej Żenczykowski
2011-03-25  7:17 ` Eric Dumazet
2011-03-28  4:43   ` Maciej Żenczykowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.