All of lore.kernel.org
 help / color / mirror / Atom feed
* HTB accuracy on 10GbE
@ 2009-11-02  7:22 Ryousei Takano
  2009-11-02  8:17 ` Badalian Vyacheslav
  2009-11-02 15:43 ` Patrick McHardy
  0 siblings, 2 replies; 21+ messages in thread
From: Ryousei Takano @ 2009-11-02  7:22 UTC (permalink / raw)
  To: shemminger; +Cc: Linux Netdev List, takano-ryousei

Hi Stephen and all,

I have observed a HTB accuracy problem on the Linux kernel 2.6.30 and
the Myri-10G 10 GbE NIC.
HTB can control the transmission rate at Gigabit speed, however it can
not work well at 10 Gigabit speed.

I asked Stephen this problem at Japan Linux Symposium.  He mentioned a
HTB bug related to the timer granularity.
I want to know what is happen, and what should be do for fixing it.

Any comments and suggestions will be welcome.

For more detail, please see the following page:
http://code.google.com/p/pspacer/wiki/HTBon10GbE

Best regards,
Ryousei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-02  7:22 HTB accuracy on 10GbE Ryousei Takano
@ 2009-11-02  8:17 ` Badalian Vyacheslav
  2009-11-02 15:43 ` Patrick McHardy
  1 sibling, 0 replies; 21+ messages in thread
From: Badalian Vyacheslav @ 2009-11-02  8:17 UTC (permalink / raw)
  To: Ryousei Takano; +Cc: shemminger, Linux Netdev List, takano-ryousei

Hello.

Also we planed convert 5-10 servers witch 1gigabit connection to one BIG server witch 10g (traffic rate in peak about 6gigabit) network card (Intel multiqueue).

Can test any patches for fix any problems in 10 gigabit connections!

Thanks!

JSC BIG Telecom

> Hi Stephen and all,
> 
> I have observed a HTB accuracy problem on the Linux kernel 2.6.30 and
> the Myri-10G 10 GbE NIC.
> HTB can control the transmission rate at Gigabit speed, however it can
> not work well at 10 Gigabit speed.
> 
> I asked Stephen this problem at Japan Linux Symposium.  He mentioned a
> HTB bug related to the timer granularity.
> I want to know what is happen, and what should be do for fixing it.
> 
> Any comments and suggestions will be welcome.
> 
> For more detail, please see the following page:
> http://code.google.com/p/pspacer/wiki/HTBon10GbE
> 
> Best regards,
> Ryousei
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-02  7:22 HTB accuracy on 10GbE Ryousei Takano
  2009-11-02  8:17 ` Badalian Vyacheslav
@ 2009-11-02 15:43 ` Patrick McHardy
  2009-11-02 20:53   ` Stephen Hemminger
  1 sibling, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2009-11-02 15:43 UTC (permalink / raw)
  To: Ryousei Takano; +Cc: shemminger, Linux Netdev List, takano-ryousei

Ryousei Takano wrote:
> Hi Stephen and all,
> 
> I have observed a HTB accuracy problem on the Linux kernel 2.6.30 and
> the Myri-10G 10 GbE NIC.
> HTB can control the transmission rate at Gigabit speed, however it can
> not work well at 10 Gigabit speed.
> 
> I asked Stephen this problem at Japan Linux Symposium.  He mentioned a
> HTB bug related to the timer granularity.
> I want to know what is happen, and what should be do for fixing it.
> 
> Any comments and suggestions will be welcome.
> 
> For more detail, please see the following page:
> http://code.google.com/p/pspacer/wiki/HTBon10GbE

This is not an easy problem to fix. Userspace, the kernel and the
netlink API use 32 bit for timing related values, which is too small
to use more than microsecond resolution. All of them need to be
converted to use bigger types, additionally some kind of compatibility
handling to deal with old iproute versions still using microsecond
resolution is required.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-02 15:43 ` Patrick McHardy
@ 2009-11-02 20:53   ` Stephen Hemminger
  2009-11-03  7:43     ` Badalian Vyacheslav
  2009-11-04  3:13     ` Ryousei Takano
  0 siblings, 2 replies; 21+ messages in thread
From: Stephen Hemminger @ 2009-11-02 20:53 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Ryousei Takano, Linux Netdev List, takano-ryousei

On Mon, 02 Nov 2009 16:43:42 +0100
Patrick McHardy <kaber@trash.net> wrote:

> Ryousei Takano wrote:
> > Hi Stephen and all,
> > 
> > I have observed a HTB accuracy problem on the Linux kernel 2.6.30 and
> > the Myri-10G 10 GbE NIC.
> > HTB can control the transmission rate at Gigabit speed, however it can
> > not work well at 10 Gigabit speed.
> > 
> > I asked Stephen this problem at Japan Linux Symposium.  He mentioned a
> > HTB bug related to the timer granularity.
> > I want to know what is happen, and what should be do for fixing it.
> > 
> > Any comments and suggestions will be welcome.
> > 
> > For more detail, please see the following page:
> > http://code.google.com/p/pspacer/wiki/HTBon10GbE
> 
> This is not an easy problem to fix. Userspace, the kernel and the
> netlink API use 32 bit for timing related values, which is too small
> to use more than microsecond resolution. All of them need to be
> converted to use bigger types, additionally some kind of compatibility
> handling to deal with old iproute versions still using microsecond
> resolution is required.

The existing API is a legacy mish-mash. The field is limited to 32 bits,
but it might be possible to use a finer scale.

Maybe if kernel advertised finer resolution through /proc/net/psched
then table could be finer grained. This would maintain compatibility
between kernel and user space. You would need to have new kernel and
new iproute to get nanosecond resolution but older combinations would
still work.

The downside is that by using nanosecond resolution the rates are upper
bounded at 4.2seconds / packet.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-02 20:53   ` Stephen Hemminger
@ 2009-11-03  7:43     ` Badalian Vyacheslav
  2009-11-03  9:33       ` Jarek Poplawski
  2009-11-04  3:13     ` Ryousei Takano
  1 sibling, 1 reply; 21+ messages in thread
From: Badalian Vyacheslav @ 2009-11-03  7:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Patrick McHardy, Ryousei Takano, Linux Netdev List,
	takano-ryousei, Eric Dumazet, David Miller

Hello dear netdev team!

Linux all time go with the times :)
Network in world go to use 10G technologies. I can test any stress patches in produce system for linux developers :)
I discharge from out company in 1 December and have 1 month for all tests.
I believe that linux net dev team do it easy. Need only begin :) Lets do it together :)

Jarek, you many times help to us fix small problems in HTB, thanks for this! All work great! Now netdev team have "crazy" Eric that do great code and not afraid do big code changes. Maybe together you think about changes and create mega patch for my testing? :)
I alltime read all changes in code from netdev mail list. Its my coffee time in morning :) 

Also interesting that say David. That linux networking planes to full support 10g technologies?

We buy 4x4 Xeon Quad + IntelCX4 x 2 network cards for test support 10g shaper in our network :) Lets begin test! :)

Best regals, Slavon.
Tech Director Assistant.
JSC BIG Telecom
Moscow, Russia

> On Mon, 02 Nov 2009 16:43:42 +0100
> Patrick McHardy <kaber@trash.net> wrote:
> 
>> Ryousei Takano wrote:
>>> Hi Stephen and all,
>>>
>>> I have observed a HTB accuracy problem on the Linux kernel 2.6.30 and
>>> the Myri-10G 10 GbE NIC.
>>> HTB can control the transmission rate at Gigabit speed, however it can
>>> not work well at 10 Gigabit speed.
>>>
>>> I asked Stephen this problem at Japan Linux Symposium.  He mentioned a
>>> HTB bug related to the timer granularity.
>>> I want to know what is happen, and what should be do for fixing it.
>>>
>>> Any comments and suggestions will be welcome.
>>>
>>> For more detail, please see the following page:
>>> http://code.google.com/p/pspacer/wiki/HTBon10GbE
>> This is not an easy problem to fix. Userspace, the kernel and the
>> netlink API use 32 bit for timing related values, which is too small
>> to use more than microsecond resolution. All of them need to be
>> converted to use bigger types, additionally some kind of compatibility
>> handling to deal with old iproute versions still using microsecond
>> resolution is required.
> 
> The existing API is a legacy mish-mash. The field is limited to 32 bits,
> but it might be possible to use a finer scale.
> 
> Maybe if kernel advertised finer resolution through /proc/net/psched
> then table could be finer grained. This would maintain compatibility
> between kernel and user space. You would need to have new kernel and
> new iproute to get nanosecond resolution but older combinations would
> still work.
> 
> The downside is that by using nanosecond resolution the rates are upper
> bounded at 4.2seconds / packet.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-03  7:43     ` Badalian Vyacheslav
@ 2009-11-03  9:33       ` Jarek Poplawski
  2009-11-03 10:13         ` Badalian Vyacheslav
  0 siblings, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2009-11-03  9:33 UTC (permalink / raw)
  To: Badalian Vyacheslav
  Cc: Stephen Hemminger, Patrick McHardy, Ryousei Takano,
	Linux Netdev List, takano-ryousei, Eric Dumazet, David Miller

On 03-11-2009 08:43, Badalian Vyacheslav wrote:
> Hello dear netdev team!
Hello dear BIG Telecom!

> 
> Linux all time go with the times :)
> Network in world go to use 10G technologies. I can test any stress patches in produce system for linux developers :)
> I discharge from out company in 1 December and have 1 month for all tests.
> I believe that linux net dev team do it easy. Need only begin :) Lets do it together :)
> 
> Jarek, you many times help to us fix small problems in HTB, thanks for this!
> All work great!

Really?! As a matter of fact there isn't fully used the last change,
especially this part:

http://marc.info/?l=linux-netdev&m=124453482324409&w=2

And nobody even noticed it for quite a long time. I'm not even sure
Ryousei needs this now for anything but comparing with some better
tool...

Best regards,
Jarek P.

PS: Slavon, (as usual ;-) wrap the lines and don't top post, please.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-03  9:33       ` Jarek Poplawski
@ 2009-11-03 10:13         ` Badalian Vyacheslav
  2009-11-03 10:54           ` Jarek Poplawski
  0 siblings, 1 reply; 21+ messages in thread
From: Badalian Vyacheslav @ 2009-11-03 10:13 UTC (permalink / raw)
  Cc: Patrick McHardy, Linux Netdev List

> Really?! As a matter of fact there isn't fully used the last change,
> especially this part:
> 
> http://marc.info/?l=linux-netdev&m=124453482324409&w=2
> 
> And nobody even noticed it for quite a long time. I'm not even sure
> Ryousei needs this now for anything but comparing with some better
> tool...
> 
> Best regards,
> Jarek P.
> 
> PS: Slavon, (as usual ;-) wrap the lines and don't top post, please.
> 
> 

These changes have not been tested and applied to kernel?

As I now remember, I promised to test this patch and judging by 
correspondence so it and have not made. My error - I am ready to
correct it. Probably there were difficulties, and is then banal 
has forgotten.
It is possible to receive once again full патчсет for the test?

Best regals, Slavon


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-03 10:13         ` Badalian Vyacheslav
@ 2009-11-03 10:54           ` Jarek Poplawski
  2009-11-03 11:13             ` Badalian Vyacheslav
  0 siblings, 1 reply; 21+ messages in thread
From: Jarek Poplawski @ 2009-11-03 10:54 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: Patrick McHardy, Linux Netdev List

On 03-11-2009 11:13, Badalian Vyacheslav wrote:
>> Really?! As a matter of fact there isn't fully used the last change,
>> especially this part:
>>
>> http://marc.info/?l=linux-netdev&m=124453482324409&w=2
>>
>> And nobody even noticed it for quite a long time. I'm not even sure
>> Ryousei needs this now for anything but comparing with some better
>> tool...
> 
> These changes have not been tested and applied to kernel?
> 
> As I now remember, I promised to test this patch and judging by 
> correspondence so it and have not made. My error - I am ready to
> correct it. Probably there were difficulties, and is then banal 
> has forgotten.
> It is possible to receive once again full ???°N~?N~?N~???N~? for the test?

There were a few iproute changes, tested enough I guess. Alas, I
didn't keep them (they could be found around with link above).

But it's not about your testing. I meant: since nobody noticed
something was (still) wrong with 1G scheduling, why bother with 10G?

Best regards,
Jarek P.

PS: ...and don't remove me from CC, please ;-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-03 10:54           ` Jarek Poplawski
@ 2009-11-03 11:13             ` Badalian Vyacheslav
  0 siblings, 0 replies; 21+ messages in thread
From: Badalian Vyacheslav @ 2009-11-03 11:13 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Patrick McHardy, Linux Netdev List

Jarek Poplawski пишет:
> On 03-11-2009 11:13, Badalian Vyacheslav wrote:
>>> Really?! As a matter of fact there isn't fully used the last change,
>>> especially this part:
>>>
>>> http://marc.info/?l=linux-netdev&m=124453482324409&w=2
>>>
>>> And nobody even noticed it for quite a long time. I'm not even sure
>>> Ryousei needs this now for anything but comparing with some better
>>> tool...
>> These changes have not been tested and applied to kernel?
>>
>> As I now remember, I promised to test this patch and judging by 
>> correspondence so it and have not made. My error - I am ready to
>> correct it. Probably there were difficulties, and is then banal 
>> has forgotten.
>> It is possible to receive once again full ???°N~?N~?N~???N~? for the test?
> 
> There were a few iproute changes, tested enough I guess. Alas, I
> didn't keep them (they could be found around with link above).
> 
> But it's not about your testing. I meant: since nobody noticed
> something was (still) wrong with 1G scheduling, why bother with 10G?
> 

Ok. In next week we get server and test it :)


> Best regards,
> Jarek P.
> 
> PS: ...and don't remove me from CC, please ;-)
> 
> 

Sorry. My mail server have limit CC :)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-02 20:53   ` Stephen Hemminger
  2009-11-03  7:43     ` Badalian Vyacheslav
@ 2009-11-04  3:13     ` Ryousei Takano
  2009-11-04  3:45       ` Ryousei Takano
  2009-11-04  5:03       ` Eric Dumazet
  1 sibling, 2 replies; 21+ messages in thread
From: Ryousei Takano @ 2009-11-04  3:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, Linux Netdev List, takano-ryousei

Hi Patrick and Stephen,

Thanks for your comments.

I retried on the newer kernel and iproute2, and added the experimental result
on my page.  Please see 'Experimental result 2':
    http://code.google.com/p/pspacer/wiki/HTBon10GbE

The accuracy improves compared with the previous experiment.
The difference reduces from +810 Mbps to +430 Mbps.
It is because the timer resolution improves from 1 usec to 1/64 usec.
But it is not perfect.

Best regards,
Ryousei Takano


On Tue, Nov 3, 2009 at 5:53 AM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Mon, 02 Nov 2009 16:43:42 +0100
> Patrick McHardy <kaber@trash.net> wrote:
>
>> Ryousei Takano wrote:
>> > Hi Stephen and all,
>> >
>> > I have observed a HTB accuracy problem on the Linux kernel 2.6.30 and
>> > the Myri-10G 10 GbE NIC.
>> > HTB can control the transmission rate at Gigabit speed, however it can
>> > not work well at 10 Gigabit speed.
>> >
>> > I asked Stephen this problem at Japan Linux Symposium.  He mentioned a
>> > HTB bug related to the timer granularity.
>> > I want to know what is happen, and what should be do for fixing it.
>> >
>> > Any comments and suggestions will be welcome.
>> >
>> > For more detail, please see the following page:
>> > http://code.google.com/p/pspacer/wiki/HTBon10GbE
>>
>> This is not an easy problem to fix. Userspace, the kernel and the
>> netlink API use 32 bit for timing related values, which is too small
>> to use more than microsecond resolution. All of them need to be
>> converted to use bigger types, additionally some kind of compatibility
>> handling to deal with old iproute versions still using microsecond
>> resolution is required.
>
> The existing API is a legacy mish-mash. The field is limited to 32 bits,
> but it might be possible to use a finer scale.
>
> Maybe if kernel advertised finer resolution through /proc/net/psched
> then table could be finer grained. This would maintain compatibility
> between kernel and user space. You would need to have new kernel and
> new iproute to get nanosecond resolution but older combinations would
> still work.
>
> The downside is that by using nanosecond resolution the rates are upper
> bounded at 4.2seconds / packet.
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04  3:13     ` Ryousei Takano
@ 2009-11-04  3:45       ` Ryousei Takano
  2009-11-04  5:03       ` Eric Dumazet
  1 sibling, 0 replies; 21+ messages in thread
From: Ryousei Takano @ 2009-11-04  3:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, Linux Netdev List, takano-ryousei

On Wed, Nov 4, 2009 at 12:13 PM, Ryousei Takano <ryousei@gmail.com> wrote:
> Hi Patrick and Stephen,
>
> Thanks for your comments.
>
> I retried on the newer kernel and iproute2, and added the experimental result
> on my page.  Please see 'Experimental result 2':
>    http://code.google.com/p/pspacer/wiki/HTBon10GbE
>
> The accuracy improves compared with the previous experiment.
> The difference reduces from +810 Mbps to +430 Mbps.
> It is because the timer resolution improves from 1 usec to 1/64 usec.
> But it is not perfect.
>
Oops, not 1/64 usec but 1/16 usec.


> Best regards,
> Ryousei Takano
>
>
> On Tue, Nov 3, 2009 at 5:53 AM, Stephen Hemminger <shemminger@vyatta.com> wrote:
>> On Mon, 02 Nov 2009 16:43:42 +0100
>> Patrick McHardy <kaber@trash.net> wrote:
>>
>>> Ryousei Takano wrote:
>>> > Hi Stephen and all,
>>> >
>>> > I have observed a HTB accuracy problem on the Linux kernel 2.6.30 and
>>> > the Myri-10G 10 GbE NIC.
>>> > HTB can control the transmission rate at Gigabit speed, however it can
>>> > not work well at 10 Gigabit speed.
>>> >
>>> > I asked Stephen this problem at Japan Linux Symposium.  He mentioned a
>>> > HTB bug related to the timer granularity.
>>> > I want to know what is happen, and what should be do for fixing it.
>>> >
>>> > Any comments and suggestions will be welcome.
>>> >
>>> > For more detail, please see the following page:
>>> > http://code.google.com/p/pspacer/wiki/HTBon10GbE
>>>
>>> This is not an easy problem to fix. Userspace, the kernel and the
>>> netlink API use 32 bit for timing related values, which is too small
>>> to use more than microsecond resolution. All of them need to be
>>> converted to use bigger types, additionally some kind of compatibility
>>> handling to deal with old iproute versions still using microsecond
>>> resolution is required.
>>
>> The existing API is a legacy mish-mash. The field is limited to 32 bits,
>> but it might be possible to use a finer scale.
>>
>> Maybe if kernel advertised finer resolution through /proc/net/psched
>> then table could be finer grained. This would maintain compatibility
>> between kernel and user space. You would need to have new kernel and
>> new iproute to get nanosecond resolution but older combinations would
>> still work.
>>
>> The downside is that by using nanosecond resolution the rates are upper
>> bounded at 4.2seconds / packet.
>>
>>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04  3:13     ` Ryousei Takano
  2009-11-04  3:45       ` Ryousei Takano
@ 2009-11-04  5:03       ` Eric Dumazet
  2009-11-04  5:27         ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2009-11-04  5:03 UTC (permalink / raw)
  To: Ryousei Takano
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Ryousei Takano a écrit :
> Hi Patrick and Stephen,
> 
> Thanks for your comments.
> 
> I retried on the newer kernel and iproute2, and added the experimental result
> on my page.  Please see 'Experimental result 2':
>     http://code.google.com/p/pspacer/wiki/HTBon10GbE
> 
> The accuracy improves compared with the previous experiment.
> The difference reduces from +810 Mbps to +430 Mbps.
> It is because the timer resolution improves from 1 usec to 1/64 usec.
> But it is not perfect.
> 

Hmm, do you know part of the error comes from the user tool itself ?

If you check iperf results at sender and receiver you'll see different
values, sender lies a bit.

Tried here on a Gbit link (I dont have 10Gbe yet)

$ ./iperf.bench.sh
.100 104
.200 206
.300 307
.400 413
.500 515
.600 610
.700 715
.800 822
.900 913
1.000 945

while on receiver :
[  4]  0.0- 5.3 sec  62.8 MBytes    100 Mbits/sec
[  5]  0.0- 5.1 sec    123 MBytes    202 Mbits/sec
[  4]  0.0- 5.1 sec    183 MBytes    303 Mbits/sec
[  5]  0.0- 5.1 sec    246 MBytes    409 Mbits/sec
[  4]  0.0- 5.0 sec    307 MBytes    511 Mbits/sec
[  5]  0.0- 5.0 sec    364 MBytes    607 Mbits/sec
[  4]  0.0- 5.0 sec    427 MBytes    711 Mbits/sec
[  5]  0.0- 5.0 sec    490 MBytes    818 Mbits/sec
[  4]  0.0- 5.0 sec    545 MBytes    909 Mbits/sec
[  5]  0.0- 5.0 sec    565 MBytes    941 Mbits/sec


You might use longer intervals to reduce this error (10 secs instead of 5 secs)

$./iperf.bench.sh
.100 102
.200 204
.300 305
.400 410
.500 513
.600 608
.700 713
.800 820
.900 911
1.000 943

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04  5:03       ` Eric Dumazet
@ 2009-11-04  5:27         ` Eric Dumazet
  2009-11-04  8:19           ` Ryousei Takano
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2009-11-04  5:27 UTC (permalink / raw)
  To: Ryousei Takano
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Eric Dumazet a écrit :
> 
> Hmm, do you know part of the error comes from the user tool itself ?
> 
> If you check iperf results at sender and receiver you'll see different
> values, sender lies a bit.
> 
> Tried here on a Gbit link (I dont have 10Gbe yet)
> 
> $ ./iperf.bench.sh
> .100 104
> .200 206
> .300 307
> .400 413
> .500 515
> .600 610
> .700 715
> .800 822
> .900 913
> 1.000 945
> 
(that was with standard 1500 MTU)

Now, with 9000 MTU and 50 seconds samples (instead of 5 s) I get :

$ ./iperf.bench.sh
.100 101
.200 200
.300 301
.400 401
.500 500
.600 601
.700 700
.800 803
.900 903
1.000 991

Not too bad :)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04  5:27         ` Eric Dumazet
@ 2009-11-04  8:19           ` Ryousei Takano
  2009-11-04 11:31             ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ryousei Takano @ 2009-11-04  8:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Hi Eric,

On Wed, Nov 4, 2009 at 2:27 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Eric Dumazet a écrit :
>>
>> Hmm, do you know part of the error comes from the user tool itself ?
>>
>> If you check iperf results at sender and receiver you'll see different
>> values, sender lies a bit.
>>
>> Tried here on a Gbit link (I dont have 10Gbe yet)
>>
>> $ ./iperf.bench.sh
>> .100 104
>> .200 206
>> .300 307
>> .400 413
>> .500 515
>> .600 610
>> .700 715
>> .800 822
>> .900 913
>> 1.000 945
>>
> (that was with standard 1500 MTU)
>
> Now, with 9000 MTU and 50 seconds samples (instead of 5 s) I get :
>
> $ ./iperf.bench.sh
> .100 101
> .200 200
> .300 301
> .400 401
> .500 500
> .600 601
> .700 700
> .800 803
> .900 903
> 1.000 991
>
> Not too bad :)
>

I tried iperf with 60 seconds samples. I got the almost same result.

Here is the result:
      sender	receiver
1.000 1.00	1.00
2.000 2.01	2.01
3.000 3.03	3.02
4.000 4.07	4.07
5.000 5.05	5.05
6.000 6.16	6.16
7.000 7.22	7.22
8.000 8.15	8.15
9.000 9.23	9.23
9.900 9.69	9.69

Best regards,
Ryousei Takano

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04  8:19           ` Ryousei Takano
@ 2009-11-04 11:31             ` Eric Dumazet
  2009-11-04 13:39               ` Jarek Poplawski
  2009-11-04 16:31               ` Ryousei Takano
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2009-11-04 11:31 UTC (permalink / raw)
  To: Ryousei Takano
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Ryousei Takano a écrit :

> 
> I tried iperf with 60 seconds samples. I got the almost same result.
> 
> Here is the result:
>       sender	receiver
> 1.000 1.00	1.00
> 2.000 2.01	2.01
> 3.000 3.03	3.02
> 4.000 4.07	4.07
> 5.000 5.05	5.05
> 6.000 6.16	6.16
> 7.000 7.22	7.22
> 8.000 8.15	8.15
> 9.000 9.23	9.23
> 9.900 9.69	9.69
> 

One thing to consider is the estimation error in qdisc_l2t(), rate table has only 256 slots

static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen)
{
	int slot = pktlen + rtab->rate.cell_align + rtab->rate.overhead;
	if (slot < 0)
		slot = 0;
	slot >>= rtab->rate.cell_log;
	if (slot > 255)
		return (rtab->data[255]*(slot >> 8) + rtab->data[slot & 0xFF]);
	return rtab->data[slot];
}


Maybe you can try changing class mtu to 40000 instead of 9000, and quantum to 60000 too

tc class add dev $DEV parent 1: classid 1:1 htb rate ${rate}mbit mtu 40000 quantum 60000

(because your tcp stack sends large buffers ( ~ 60000 bytes) as your NIC can offload tcp segmentation)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04 11:31             ` Eric Dumazet
@ 2009-11-04 13:39               ` Jarek Poplawski
  2009-11-04 16:31               ` Ryousei Takano
  1 sibling, 0 replies; 21+ messages in thread
From: Jarek Poplawski @ 2009-11-04 13:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ryousei Takano, Stephen Hemminger, Patrick McHardy,
	Linux Netdev List, takano-ryousei

On 04-11-2009 12:31, Eric Dumazet wrote:
....
> Maybe you can try changing class mtu to 40000 instead of 9000, and quantum to 60000 too
> 
> tc class add dev $DEV parent 1: classid 1:1 htb rate ${rate}mbit mtu 40000 quantum 60000
> 
> (because your tcp stack sends large buffers ( ~ 60000 bytes) as your NIC can offload tcp segmentation)
> 

Hmm..., testing htb scheduling exactness with tso/gso on seems kind of
weather reporting. On the other hand, depending on hardware, these
rates could be available with mtu 9000 and tso/gso off, unless I
miss something. So maybe such a test would be interesting too?
Then I'd suggest this one, erlier mentioned, patch to iproute2:
http://marc.info/?l=linux-netdev&m=124453482324409&w=2

Best regards,
Jarek P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04 11:31             ` Eric Dumazet
  2009-11-04 13:39               ` Jarek Poplawski
@ 2009-11-04 16:31               ` Ryousei Takano
  2009-11-04 17:03                 ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Ryousei Takano @ 2009-11-04 16:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Hi Eric,

Thanks for your suggestion.

On Wed, Nov 4, 2009 at 8:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Ryousei Takano a écrit :
>
>>
>> I tried iperf with 60 seconds samples. I got the almost same result.
>>
>> Here is the result:
>>       sender  receiver
>> 1.000 1.00    1.00
>> 2.000 2.01    2.01
>> 3.000 3.03    3.02
>> 4.000 4.07    4.07
>> 5.000 5.05    5.05
>> 6.000 6.16    6.16
>> 7.000 7.22    7.22
>> 8.000 8.15    8.15
>> 9.000 9.23    9.23
>> 9.900 9.69    9.69
>>
>
> One thing to consider is the estimation error in qdisc_l2t(), rate table has only 256 slots
>
> static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen)
> {
>        int slot = pktlen + rtab->rate.cell_align + rtab->rate.overhead;
>        if (slot < 0)
>                slot = 0;
>        slot >>= rtab->rate.cell_log;
>        if (slot > 255)
>                return (rtab->data[255]*(slot >> 8) + rtab->data[slot & 0xFF]);
>        return rtab->data[slot];
> }
>
>
> Maybe you can try changing class mtu to 40000 instead of 9000, and quantum to 60000 too
>
> tc class add dev $DEV parent 1: classid 1:1 htb rate ${rate}mbit mtu 40000 quantum 60000
>
> (because your tcp stack sends large buffers ( ~ 60000 bytes) as your NIC can offload tcp segmentation)
>
>
You are right!
I am using TSO. The myri10ge driver is passing 64KB packets to the NIC.
I changed the class mtu parameter to 64000 instead of 9000.

Here is the result:
1.000 1.00
2.000 2.01
3.000 2.99
4.000 4.01
5.000 5.01
6.000 6.04
7.000 7.06
8.000 8.09
9.000 9.11
9.900 9.64

It's not so bad!
For more information, I updated the results on my page.

Best regards,
Ryousei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04 16:31               ` Ryousei Takano
@ 2009-11-04 17:03                 ` Eric Dumazet
  2009-11-05  7:08                   ` Ryousei Takano
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2009-11-04 17:03 UTC (permalink / raw)
  To: Ryousei Takano
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Ryousei Takano a écrit :
> Hi Eric,
> 
> Thanks for your suggestion.
> 
> On Wed, Nov 4, 2009 at 8:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Ryousei Takano a écrit :
>>
>>> I tried iperf with 60 seconds samples. I got the almost same result.
>>>
>>> Here is the result:
>>>       sender  receiver
>>> 1.000 1.00    1.00
>>> 2.000 2.01    2.01
>>> 3.000 3.03    3.02
>>> 4.000 4.07    4.07
>>> 5.000 5.05    5.05
>>> 6.000 6.16    6.16
>>> 7.000 7.22    7.22
>>> 8.000 8.15    8.15
>>> 9.000 9.23    9.23
>>> 9.900 9.69    9.69
>>>
>> One thing to consider is the estimation error in qdisc_l2t(), rate table has only 256 slots
>>
>> static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen)
>> {
>>        int slot = pktlen + rtab->rate.cell_align + rtab->rate.overhead;
>>        if (slot < 0)
>>                slot = 0;
>>        slot >>= rtab->rate.cell_log;
>>        if (slot > 255)
>>                return (rtab->data[255]*(slot >> 8) + rtab->data[slot & 0xFF]);
>>        return rtab->data[slot];
>> }
>>
>>
>> Maybe you can try changing class mtu to 40000 instead of 9000, and quantum to 60000 too
>>
>> tc class add dev $DEV parent 1: classid 1:1 htb rate ${rate}mbit mtu 40000 quantum 60000
>>
>> (because your tcp stack sends large buffers ( ~ 60000 bytes) as your NIC can offload tcp segmentation)
>>
>>
> You are right!
> I am using TSO. The myri10ge driver is passing 64KB packets to the NIC.
> I changed the class mtu parameter to 64000 instead of 9000.
> 
> Here is the result:
> 1.000 1.00
> 2.000 2.01
> 3.000 2.99
> 4.000 4.01
> 5.000 5.01
> 6.000 6.04
> 7.000 7.06
> 8.000 8.09
> 9.000 9.11
> 9.900 9.64
> 
> It's not so bad!
> For more information, I updated the results on my page.
> 


In fact, I gave you 40000 because rtab will contain 256 elements from 0 to 65280

If you use 64000, you lose some precision (for small packets for example)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-04 17:03                 ` Eric Dumazet
@ 2009-11-05  7:08                   ` Ryousei Takano
  2009-11-05  7:10                     ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ryousei Takano @ 2009-11-05  7:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Hi Eric,

On Thu, Nov 5, 2009 at 2:03 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Ryousei Takano a écrit :
>> Hi Eric,
>>
>> Thanks for your suggestion.
>>
>> On Wed, Nov 4, 2009 at 8:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Ryousei Takano a écrit :
>>>
>>>> I tried iperf with 60 seconds samples. I got the almost same result.
>>>>
>>>> Here is the result:
>>>>       sender  receiver
>>>> 1.000 1.00    1.00
>>>> 2.000 2.01    2.01
>>>> 3.000 3.03    3.02
>>>> 4.000 4.07    4.07
>>>> 5.000 5.05    5.05
>>>> 6.000 6.16    6.16
>>>> 7.000 7.22    7.22
>>>> 8.000 8.15    8.15
>>>> 9.000 9.23    9.23
>>>> 9.900 9.69    9.69
>>>>
>>> One thing to consider is the estimation error in qdisc_l2t(), rate table has only 256 slots
>>>
>>> static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen)
>>> {
>>>        int slot = pktlen + rtab->rate.cell_align + rtab->rate.overhead;
>>>        if (slot < 0)
>>>                slot = 0;
>>>        slot >>= rtab->rate.cell_log;
>>>        if (slot > 255)
>>>                return (rtab->data[255]*(slot >> 8) + rtab->data[slot & 0xFF]);
>>>        return rtab->data[slot];
>>> }
>>>
>>>
>>> Maybe you can try changing class mtu to 40000 instead of 9000, and quantum to 60000 too
>>>
>>> tc class add dev $DEV parent 1: classid 1:1 htb rate ${rate}mbit mtu 40000 quantum 60000
>>>
>>> (because your tcp stack sends large buffers ( ~ 60000 bytes) as your NIC can offload tcp segmentation)
>>>
>>>
>> You are right!
>> I am using TSO. The myri10ge driver is passing 64KB packets to the NIC.
>> I changed the class mtu parameter to 64000 instead of 9000.
>>
>> Here is the result:
>> 1.000 1.00
>> 2.000 2.01
>> 3.000 2.99
>> 4.000 4.01
>> 5.000 5.01
>> 6.000 6.04
>> 7.000 7.06
>> 8.000 8.09
>> 9.000 9.11
>> 9.900 9.64
>>
>> It's not so bad!
>> For more information, I updated the results on my page.
>>
>
>
> In fact, I gave you 40000 because rtab will contain 256 elements from 0 to 65280
>
> If you use 64000, you lose some precision (for small packets for example)
>
I see.

In my experiment, it is not very big problem.  I do not send short packets.
I got the almost same result in the both cases "mtu 64000" and "mtu
40000 quantum 60000".

Anyway, setting larger mtu size than the physical mtu does not quiet make sense.

Best regards,
Ryousei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-05  7:08                   ` Ryousei Takano
@ 2009-11-05  7:10                     ` Eric Dumazet
  2009-11-05 10:15                       ` Ryousei Takano
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2009-11-05  7:10 UTC (permalink / raw)
  To: Ryousei Takano
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Ryousei Takano a écrit :
> In my experiment, it is not very big problem.  I do not send short packets.
> I got the almost same result in the both cases "mtu 64000" and "mtu
> 40000 quantum 60000".
> 
> Anyway, setting larger mtu size than the physical mtu does not quiet make sense.
> 

tc class mtu is a hint given to stack, about average packet size, ie not
related to physical MTU (because of TSO)

You could use same mtu, but disable tso on device

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: HTB accuracy on 10GbE
  2009-11-05  7:10                     ` Eric Dumazet
@ 2009-11-05 10:15                       ` Ryousei Takano
  0 siblings, 0 replies; 21+ messages in thread
From: Ryousei Takano @ 2009-11-05 10:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Patrick McHardy, Linux Netdev List, takano-ryousei

Hi Eric,

On Thu, Nov 5, 2009 at 4:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Ryousei Takano a écrit :
>> In my experiment, it is not very big problem.  I do not send short packets.
>> I got the almost same result in the both cases "mtu 64000" and "mtu
>> 40000 quantum 60000".
>>
>> Anyway, setting larger mtu size than the physical mtu does not quiet make sense.
>>
>
> tc class mtu is a hint given to stack, about average packet size, ie not
> related to physical MTU (because of TSO)
>
> You could use same mtu, but disable tso on device
>
I got it.
Thanks for your explanation.

Best regards,
Ryousei

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-11-05 10:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-02  7:22 HTB accuracy on 10GbE Ryousei Takano
2009-11-02  8:17 ` Badalian Vyacheslav
2009-11-02 15:43 ` Patrick McHardy
2009-11-02 20:53   ` Stephen Hemminger
2009-11-03  7:43     ` Badalian Vyacheslav
2009-11-03  9:33       ` Jarek Poplawski
2009-11-03 10:13         ` Badalian Vyacheslav
2009-11-03 10:54           ` Jarek Poplawski
2009-11-03 11:13             ` Badalian Vyacheslav
2009-11-04  3:13     ` Ryousei Takano
2009-11-04  3:45       ` Ryousei Takano
2009-11-04  5:03       ` Eric Dumazet
2009-11-04  5:27         ` Eric Dumazet
2009-11-04  8:19           ` Ryousei Takano
2009-11-04 11:31             ` Eric Dumazet
2009-11-04 13:39               ` Jarek Poplawski
2009-11-04 16:31               ` Ryousei Takano
2009-11-04 17:03                 ` Eric Dumazet
2009-11-05  7:08                   ` Ryousei Takano
2009-11-05  7:10                     ` Eric Dumazet
2009-11-05 10:15                       ` Ryousei Takano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.