netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unable to create htb tc classes more than 64K
@ 2019-08-16 12:48 Akshat Kakkar
  2019-08-16 17:45 ` Cong Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Akshat Kakkar @ 2019-08-16 12:48 UTC (permalink / raw)
  To: netfilter-devel, lartc, netdev

I want to have around 1 Million htb tc classes.
The simple structure of htb tc class, allow having only 64K classes at once.
But, it is possible to make it more hierarchical using hierarchy of
qdisc and classes.
For this I tried something like this

tc qdisc add dev eno2 root handle 100: htb
tc class add dev eno2 parent 100: classid 100:1 htb rate 100Mbps
tc class add dev eno2 parent 100: classid 100:2 htb rate 100Mbps

tc qdisc add dev eno2 parent 100:1 handle 1: htb
tc class add dev eno2 parent 1: classid 1:10 htb rate 100kbps
tc class add dev eno2 parent 1: classid 1:20 htb rate 300kbps

tc qdisc add dev eno2 parent 100:2 handle 2: htb
tc class add dev eno2 parent 2: classid 2:10 htb rate 100kbps
tc class add dev eno2 parent 2: classid 2:20 htb rate 300kbps

What I want is something like:
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000001 fw flowid 1:10
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000002 fw flowid 1:20
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000003 fw flowid 2:10
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000004 fw flowid 2:20

But I am unable to shape my traffic by any of 1:10, 1:20, 2:10 or 2:20.

Can you please suggest, where is it going wrong?
Is it not possible altogether?

-Akshat

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-16 12:48 Unable to create htb tc classes more than 64K Akshat Kakkar
@ 2019-08-16 17:45 ` Cong Wang
  2019-08-17 12:46   ` Akshat Kakkar
  0 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2019-08-16 17:45 UTC (permalink / raw)
  To: Akshat Kakkar; +Cc: NetFilter, lartc, netdev

On Fri, Aug 16, 2019 at 5:49 AM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>
> I want to have around 1 Million htb tc classes.
> The simple structure of htb tc class, allow having only 64K classes at once.

This is probably due the limit of class ID which is 16bit for minor.


> But, it is possible to make it more hierarchical using hierarchy of
> qdisc and classes.
> For this I tried something like this
>
> tc qdisc add dev eno2 root handle 100: htb
> tc class add dev eno2 parent 100: classid 100:1 htb rate 100Mbps
> tc class add dev eno2 parent 100: classid 100:2 htb rate 100Mbps
>
> tc qdisc add dev eno2 parent 100:1 handle 1: htb
> tc class add dev eno2 parent 1: classid 1:10 htb rate 100kbps
> tc class add dev eno2 parent 1: classid 1:20 htb rate 300kbps
>
> tc qdisc add dev eno2 parent 100:2 handle 2: htb
> tc class add dev eno2 parent 2: classid 2:10 htb rate 100kbps
> tc class add dev eno2 parent 2: classid 2:20 htb rate 300kbps
>
> What I want is something like:
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000001 fw flowid 1:10
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000002 fw flowid 1:20
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000003 fw flowid 2:10
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000004 fw flowid 2:20
>
> But I am unable to shape my traffic by any of 1:10, 1:20, 2:10 or 2:20.
>
> Can you please suggest, where is it going wrong?
> Is it not possible altogether?

The filter could only filter for classes on the same level, you are
trying to filter for the children classes, which doesn't work.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-16 17:45 ` Cong Wang
@ 2019-08-17 12:46   ` Akshat Kakkar
  2019-08-17 18:24     ` Cong Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Akshat Kakkar @ 2019-08-17 12:46 UTC (permalink / raw)
  To: Cong Wang; +Cc: NetFilter, lartc, netdev

I agree that it is because of 16bit of minor I'd of class which
restricts it to 64K.
Point is, can we use multilevel qdisc and classes to extend it to more
no. of classes i.e. to more than 64K classes

One scheme can be like
                                      100: root qdisc
                                         |
                                       / | \
                                     /   |   \
                                   /     |     \
                                 /       |       \
                          100:1   100:2   100:3        child classes
                            |              |           |
                            |              |           |
                            |              |           |
                           1:            2:          3:     qdisc
                           / \           / \           / \
                         /     \                     /     \
                      1:1    1:2             3:1      3:2 leaf classes

with all qdisc and classes defined as htb.

Is this correct approach? Any alternative??

Besides, in order to direct traffic to leaf classes 1:1, 1:2, 2:1,
2:2, 3:1, 3:2 .... , instead of using filters I am using ipset with
skbprio and iptables map-set match rule.
But even after all this it don't work. Why?

What I am missing?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-17 12:46   ` Akshat Kakkar
@ 2019-08-17 18:24     ` Cong Wang
  2019-08-17 19:04       ` Akshat Kakkar
  0 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2019-08-17 18:24 UTC (permalink / raw)
  To: Akshat Kakkar; +Cc: NetFilter, lartc, netdev

On Sat, Aug 17, 2019 at 5:46 AM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>
> I agree that it is because of 16bit of minor I'd of class which
> restricts it to 64K.
> Point is, can we use multilevel qdisc and classes to extend it to more
> no. of classes i.e. to more than 64K classes

If your goal is merely having as many classes as you can, then yes.


>
> One scheme can be like
>                                       100: root qdisc
>                                          |
>                                        / | \
>                                      /   |   \
>                                    /     |     \
>                                  /       |       \
>                           100:1   100:2   100:3        child classes
>                             |              |           |
>                             |              |           |
>                             |              |           |
>                            1:            2:          3:     qdisc
>                            / \           / \           / \
>                          /     \                     /     \
>                       1:1    1:2             3:1      3:2 leaf classes
>
> with all qdisc and classes defined as htb.
>
> Is this correct approach? Any alternative??

Again, depends on what your goal is.


>
> Besides, in order to direct traffic to leaf classes 1:1, 1:2, 2:1,
> 2:2, 3:1, 3:2 .... , instead of using filters I am using ipset with
> skbprio and iptables map-set match rule.
> But even after all this it don't work. Why?

Again, the filters you use to classify the packets could only
work for the classes on the same level, no the next level.


Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-17 18:24     ` Cong Wang
@ 2019-08-17 19:04       ` Akshat Kakkar
  2019-08-20  6:26         ` Akshat Kakkar
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Akshat Kakkar @ 2019-08-17 19:04 UTC (permalink / raw)
  To: Cong Wang; +Cc: NetFilter, lartc, netdev

On Sat, Aug 17, 2019 at 11:54 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Sat, Aug 17, 2019 at 5:46 AM Akshat Kakkar <akshat.1984@gmail.com> wrote:
> >
> > I agree that it is because of 16bit of minor I'd of class which
> > restricts it to 64K.
> > Point is, can we use multilevel qdisc and classes to extend it to more
> > no. of classes i.e. to more than 64K classes
>
> If your goal is merely having as many classes as you can, then yes.
My goal is not just to make as many classes as possible, but also to
use them to do rate limiting per ip per server. Say, I have a list of
10000 IPs and more than 100 servers. So simply if I want few IPs to
get speed of says 1Mbps per server but others say speed of 2 Mbps per
server. How can I achieve this without having 10000 x 100 classes.
These numbers can be large than this and hence I am looking for a
generic solution to this.

>
>
> >
> > One scheme can be like
> >                                       100: root qdisc
> >                                          |
> >                                        / | \
> >                                      /   |   \
> >                                    /     |     \
> >                                  /       |       \
> >                           100:1   100:2   100:3        child classes
> >                             |              |           |
> >                             |              |           |
> >                             |              |           |
> >                            1:            2:          3:     qdisc
> >                            / \           / \           / \
> >                          /     \                     /     \
> >                       1:1    1:2             3:1      3:2 leaf classes
> >
> > with all qdisc and classes defined as htb.
> >
> > Is this correct approach? Any alternative??
>
> Again, depends on what your goal is.
>
>
> >
> > Besides, in order to direct traffic to leaf classes 1:1, 1:2, 2:1,
> > 2:2, 3:1, 3:2 .... , instead of using filters I am using ipset with
> > skbprio and iptables map-set match rule.
> > But even after all this it don't work. Why?
>
> Again, the filters you use to classify the packets could only
> work for the classes on the same level, no the next level.

I am using ipset +  iptables to classify and not filters. Besides, if
tc is allowing me to define qdisc -> classes -> qdsic -> classes
(1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
then how can those lowest child classes be actually used or consumed?

>
>
> Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-17 19:04       ` Akshat Kakkar
@ 2019-08-20  6:26         ` Akshat Kakkar
  2019-08-21 22:06         ` Cong Wang
  2019-08-26 16:45         ` Jesper Dangaard Brouer
  2 siblings, 0 replies; 16+ messages in thread
From: Akshat Kakkar @ 2019-08-20  6:26 UTC (permalink / raw)
  To: Cong Wang, Anton Danilov; +Cc: NetFilter, lartc, netdev

>> If your goal is merely having as many classes as you can, then yes.

My goal is not just to make as many classes as possible, but also to
use them to do rate limiting per ip per server. Say, I have a list of
10000 IPs and more than 100 servers. So simply if I want few IPs to
get speed of says 1Mbps per server but others say speed of 2 Mbps per
server. How can I achieve this without having 10000 x 100 classes.
These numbers can be large than this and hence I am looking for a
generic solution to this.

I am using ipset +  iptables to classify and not filters.

Besides, if tc is allowing me to define qdisc (100:) -> classes
(100:1) -> qdisc(1:  2:  3: ) -> classes (1:1,1:2   2:1,2:2    3:1,
3:2   ...) sort of structure (ie like the one shown in ascii tree)
then how should those lowest child classes be actually used or
consumed or where it can be used?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-17 19:04       ` Akshat Kakkar
  2019-08-20  6:26         ` Akshat Kakkar
@ 2019-08-21 22:06         ` Cong Wang
  2019-08-22  5:59           ` Akshat Kakkar
  2019-08-26 16:45         ` Jesper Dangaard Brouer
  2 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2019-08-21 22:06 UTC (permalink / raw)
  To: Akshat Kakkar; +Cc: NetFilter, lartc, netdev

On Sat, Aug 17, 2019 at 12:04 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
> I am using ipset +  iptables to classify and not filters. Besides, if
> tc is allowing me to define qdisc -> classes -> qdsic -> classes
> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
> then how can those lowest child classes be actually used or consumed?

Just install tc filters on the lower level too.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-21 22:06         ` Cong Wang
@ 2019-08-22  5:59           ` Akshat Kakkar
  2019-08-25 17:52             ` Cong Wang
  0 siblings, 1 reply; 16+ messages in thread
From: Akshat Kakkar @ 2019-08-22  5:59 UTC (permalink / raw)
  To: Cong Wang, Anton Danilov; +Cc: NetFilter, lartc, netdev

On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > I am using ipset +  iptables to classify and not filters. Besides, if
> > tc is allowing me to define qdisc -> classes -> qdsic -> classes
> > (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
> > then how can those lowest child classes be actually used or consumed?
>
> Just install tc filters on the lower level too.

If I understand correctly, you are saying,
instead of :
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000001 fw flowid 1:10
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000002 fw flowid 1:20
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000003 fw flowid 2:10
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000004 fw flowid 2:20


I should do this: (i.e. changing parent to just immediate qdisc)
tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
fw flowid 1:10
tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
fw flowid 1:20
tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
fw flowid 2:10
tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
fw flowid 2:20

I tried this previously. But there is not change in the result.
Behaviour is exactly same, i.e. I am still getting 100Mbps and not
100kbps or 300kbps

Besides, as I mentioned previously I am using ipset + skbprio and not
filters stuff. Filters I used just to test.

ipset  -N foo hash:ip,mark skbinfo

ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20

iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio

That's why I added @Anton Danilov in cc, so that he can have a look as
he designed this skbprio thing in ipset and thus would be having a
better idea.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-22  5:59           ` Akshat Kakkar
@ 2019-08-25 17:52             ` Cong Wang
  2019-08-26  6:32               ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2019-08-25 17:52 UTC (permalink / raw)
  To: Akshat Kakkar; +Cc: Anton Danilov, NetFilter, lartc, netdev

On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>
> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > I am using ipset +  iptables to classify and not filters. Besides, if
> > > tc is allowing me to define qdisc -> classes -> qdsic -> classes
> > > (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
> > > then how can those lowest child classes be actually used or consumed?
> >
> > Just install tc filters on the lower level too.
>
> If I understand correctly, you are saying,
> instead of :
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000001 fw flowid 1:10
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000002 fw flowid 1:20
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000003 fw flowid 2:10
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000004 fw flowid 2:20
>
>
> I should do this: (i.e. changing parent to just immediate qdisc)
> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
> fw flowid 1:10
> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
> fw flowid 1:20
> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
> fw flowid 2:10
> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
> fw flowid 2:20


Yes, this is what I meant.


>
> I tried this previously. But there is not change in the result.
> Behaviour is exactly same, i.e. I am still getting 100Mbps and not
> 100kbps or 300kbps
>
> Besides, as I mentioned previously I am using ipset + skbprio and not
> filters stuff. Filters I used just to test.
>
> ipset  -N foo hash:ip,mark skbinfo
>
> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20
>
> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio

Hmm..

I am not familiar with ipset, but it seems to save the skbprio into
skb->priority, so it doesn't need TC filter to classify it again.

I guess your packets might go to the direct queue of HTB, which
bypasses the token bucket. Can you dump the stats and check?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-25 17:52             ` Cong Wang
@ 2019-08-26  6:32               ` Eric Dumazet
  2019-08-26  7:28                 ` Toke Høiland-Jørgensen
                                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Eric Dumazet @ 2019-08-26  6:32 UTC (permalink / raw)
  To: Cong Wang, Akshat Kakkar; +Cc: Anton Danilov, NetFilter, lartc, netdev



On 8/25/19 7:52 PM, Cong Wang wrote:
> On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>>
>> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>> I am using ipset +  iptables to classify and not filters. Besides, if
>>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes
>>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
>>>> then how can those lowest child classes be actually used or consumed?
>>>
>>> Just install tc filters on the lower level too.
>>
>> If I understand correctly, you are saying,
>> instead of :
>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>> 0x00000001 fw flowid 1:10
>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>> 0x00000002 fw flowid 1:20
>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>> 0x00000003 fw flowid 2:10
>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>> 0x00000004 fw flowid 2:20
>>
>>
>> I should do this: (i.e. changing parent to just immediate qdisc)
>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
>> fw flowid 1:10
>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
>> fw flowid 1:20
>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
>> fw flowid 2:10
>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
>> fw flowid 2:20
> 
> 
> Yes, this is what I meant.
> 
> 
>>
>> I tried this previously. But there is not change in the result.
>> Behaviour is exactly same, i.e. I am still getting 100Mbps and not
>> 100kbps or 300kbps
>>
>> Besides, as I mentioned previously I am using ipset + skbprio and not
>> filters stuff. Filters I used just to test.
>>
>> ipset  -N foo hash:ip,mark skbinfo
>>
>> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
>> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
>> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
>> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20
>>
>> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio
> 
> Hmm..
> 
> I am not familiar with ipset, but it seems to save the skbprio into
> skb->priority, so it doesn't need TC filter to classify it again.
> 
> I guess your packets might go to the direct queue of HTB, which
> bypasses the token bucket. Can you dump the stats and check?

With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and
an eBPF program using EDT model (Earliest Departure Time)

The BPF program would perform the classification, then find a data structure
based on the 'class', and then update/maintain class virtual times and skb->tstamp

TBF = bpf_map_lookup_elem(&map, &classid);

uint64_t now = bpf_ktime_get_ns();
uint64_t time_to_send = max(TBF->time_to_send, now);

time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate;
if (time_to_send > TBF->max_horizon) {
    return TC_ACT_SHOT;
}
TBF->time_to_send = time_to_send;
skb->tstamp = max(time_to_send, skb->tstamp);
if (time_to_send - now > TBF->ecn_horizon)
    bpf_skb_ecn_set_ce(skb);
return TC_ACT_OK;

tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar.


[1]  MQ + FQ if the device is multi-queues.

   Note that this setup scales very well on SMP, since we no longer are forced
 to use a single HTB hierarchy (protected by a single spinlock)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-26  6:32               ` Eric Dumazet
@ 2019-08-26  7:28                 ` Toke Høiland-Jørgensen
  2019-08-27 20:53                 ` Dave Taht
  2020-01-10 12:38                 ` Akshat Kakkar
  2 siblings, 0 replies; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-08-26  7:28 UTC (permalink / raw)
  To: Eric Dumazet, Cong Wang, Akshat Kakkar
  Cc: Anton Danilov, NetFilter, lartc, netdev

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 8/25/19 7:52 PM, Cong Wang wrote:
>> On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>>>
>>> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> I am using ipset +  iptables to classify and not filters. Besides, if
>>>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes
>>>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
>>>>> then how can those lowest child classes be actually used or consumed?
>>>>
>>>> Just install tc filters on the lower level too.
>>>
>>> If I understand correctly, you are saying,
>>> instead of :
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000001 fw flowid 1:10
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000002 fw flowid 1:20
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000003 fw flowid 2:10
>>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>>> 0x00000004 fw flowid 2:20
>>>
>>>
>>> I should do this: (i.e. changing parent to just immediate qdisc)
>>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
>>> fw flowid 1:10
>>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
>>> fw flowid 1:20
>>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
>>> fw flowid 2:10
>>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
>>> fw flowid 2:20
>> 
>> 
>> Yes, this is what I meant.
>> 
>> 
>>>
>>> I tried this previously. But there is not change in the result.
>>> Behaviour is exactly same, i.e. I am still getting 100Mbps and not
>>> 100kbps or 300kbps
>>>
>>> Besides, as I mentioned previously I am using ipset + skbprio and not
>>> filters stuff. Filters I used just to test.
>>>
>>> ipset  -N foo hash:ip,mark skbinfo
>>>
>>> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
>>> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
>>> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
>>> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20
>>>
>>> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio
>> 
>> Hmm..
>> 
>> I am not familiar with ipset, but it seems to save the skbprio into
>> skb->priority, so it doesn't need TC filter to classify it again.
>> 
>> I guess your packets might go to the direct queue of HTB, which
>> bypasses the token bucket. Can you dump the stats and check?
>
> With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and
> an eBPF program using EDT model (Earliest Departure Time)
>
> The BPF program would perform the classification, then find a data structure
> based on the 'class', and then update/maintain class virtual times and skb->tstamp
>
> TBF = bpf_map_lookup_elem(&map, &classid);
>
> uint64_t now = bpf_ktime_get_ns();
> uint64_t time_to_send = max(TBF->time_to_send, now);
>
> time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate;
> if (time_to_send > TBF->max_horizon) {
>     return TC_ACT_SHOT;
> }
> TBF->time_to_send = time_to_send;
> skb->tstamp = max(time_to_send, skb->tstamp);
> if (time_to_send - now > TBF->ecn_horizon)
>     bpf_skb_ecn_set_ce(skb);
> return TC_ACT_OK;
>
> tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar.
>
>
> [1]  MQ + FQ if the device is multi-queues.
>
>    Note that this setup scales very well on SMP, since we no longer are forced
>  to use a single HTB hierarchy (protected by a single spinlock)

Wow, this is very cool! Thanks for that walk-through, Eric :)

-Toke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-17 19:04       ` Akshat Kakkar
  2019-08-20  6:26         ` Akshat Kakkar
  2019-08-21 22:06         ` Cong Wang
@ 2019-08-26 16:45         ` Jesper Dangaard Brouer
  2 siblings, 0 replies; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2019-08-26 16:45 UTC (permalink / raw)
  To: Akshat Kakkar
  Cc: brouer, Cong Wang, NetFilter, lartc, netdev, Eric Dumazet,
	Toke Høiland-Jørgensen, Anton Danilov

On Sun, 18 Aug 2019 00:34:33 +0530
Akshat Kakkar <akshat.1984@gmail.com> wrote:

> My goal is not just to make as many classes as possible, but also to
> use them to do rate limiting per ip per server. Say, I have a list of
> 10000 IPs and more than 100 servers. So simply if I want few IPs to
> get speed of says 1Mbps per server but others say speed of 2 Mbps per
> server. How can I achieve this without having 10000 x 100 classes.
> These numbers can be large than this and hence I am looking for a
> generic solution to this.

As Eric Dumazet also points out indirectly, you will be creating a huge
bottleneck for SMP/multi-core CPUs.  As your HTB root qdisc is a
serialization point for all egress traffic, that all CPUs will need to
take a lock on.

It sounds like your use-case is not global rate limiting, but instead
the goal is to rate limit customers or services (to something
significantly lower than NIC link speed).  To get scalability, in this
case, you can instead use the MQ qdisc (as Eric also points out).
I have an example script here[1], that shows how to setup MQ as root
qdisc and add HTB leafs based on how many TX-queue the interface have
via /sys/class/net/$DEV/queues/tx-*/

[1] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/bin/tc_mq_htb_setup_example.sh


You are not done, yet.  For solving the TX-queue locking congestion, the
traffic needs to be redirected to the appropriate/correct TX CPUs. This
can either be done with RSS (Receive Side Scaling) HW ethtool
adjustment (reduce hash to IPs L3 only), or RPS (Receive Packet
Steering), or with XDP cpumap redirect.

The XDP cpumap redirect feature is implemented with XDP+TC BPF code
here[2]. Notice, that XPS can screw with this so there is a XPS disable
script here[3].


[2] https://github.com/xdp-project/xdp-cpumap-tc
[3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/bin/xps_setup.sh

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-26  6:32               ` Eric Dumazet
  2019-08-26  7:28                 ` Toke Høiland-Jørgensen
@ 2019-08-27 20:53                 ` Dave Taht
  2019-08-27 21:09                   ` Eric Dumazet
  2020-01-10 12:38                 ` Akshat Kakkar
  2 siblings, 1 reply; 16+ messages in thread
From: Dave Taht @ 2019-08-27 20:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Akshat Kakkar, Anton Danilov, NetFilter, lartc, netdev

On Sun, Aug 25, 2019 at 11:47 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 8/25/19 7:52 PM, Cong Wang wrote:
> > On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
> >>
> >> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >>>> I am using ipset +  iptables to classify and not filters. Besides, if
> >>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes
> >>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
> >>>> then how can those lowest child classes be actually used or consumed?
> >>>
> >>> Just install tc filters on the lower level too.
> >>
> >> If I understand correctly, you are saying,
> >> instead of :
> >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> >> 0x00000001 fw flowid 1:10
> >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> >> 0x00000002 fw flowid 1:20
> >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> >> 0x00000003 fw flowid 2:10
> >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> >> 0x00000004 fw flowid 2:20
> >>
> >>
> >> I should do this: (i.e. changing parent to just immediate qdisc)
> >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
> >> fw flowid 1:10
> >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
> >> fw flowid 1:20
> >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
> >> fw flowid 2:10
> >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
> >> fw flowid 2:20
> >
> >
> > Yes, this is what I meant.
> >
> >
> >>
> >> I tried this previously. But there is not change in the result.
> >> Behaviour is exactly same, i.e. I am still getting 100Mbps and not
> >> 100kbps or 300kbps
> >>
> >> Besides, as I mentioned previously I am using ipset + skbprio and not
> >> filters stuff. Filters I used just to test.
> >>
> >> ipset  -N foo hash:ip,mark skbinfo
> >>
> >> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
> >> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
> >> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
> >> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20
> >>
> >> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio
> >
> > Hmm..
> >
> > I am not familiar with ipset, but it seems to save the skbprio into
> > skb->priority, so it doesn't need TC filter to classify it again.
> >
> > I guess your packets might go to the direct queue of HTB, which
> > bypasses the token bucket. Can you dump the stats and check?
>
> With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and
> an eBPF program using EDT model (Earliest Departure Time)

Although this is very cool, I think in this case the OP is being
a router, not server?

> The BPF program would perform the classification, then find a data structure
> based on the 'class', and then update/maintain class virtual times and skb->tstamp
>
> TBF = bpf_map_lookup_elem(&map, &classid);
>
> uint64_t now = bpf_ktime_get_ns();
> uint64_t time_to_send = max(TBF->time_to_send, now);
>
> time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate;
> if (time_to_send > TBF->max_horizon) {
>     return TC_ACT_SHOT;
> }
> TBF->time_to_send = time_to_send;
> skb->tstamp = max(time_to_send, skb->tstamp);
> if (time_to_send - now > TBF->ecn_horizon)
>     bpf_skb_ecn_set_ce(skb);
> return TC_ACT_OK;
>
> tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar.
>
>
> [1]  MQ + FQ if the device is multi-queues.
>
>    Note that this setup scales very well on SMP, since we no longer are forced
>  to use a single HTB hierarchy (protected by a single spinlock)
>


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-27 20:53                 ` Dave Taht
@ 2019-08-27 21:09                   ` Eric Dumazet
  2019-08-27 21:41                     ` Dave Taht
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2019-08-27 21:09 UTC (permalink / raw)
  To: Dave Taht, Eric Dumazet
  Cc: Cong Wang, Akshat Kakkar, Anton Danilov, NetFilter, lartc, netdev



On 8/27/19 10:53 PM, Dave Taht wrote:
> 
> Although this is very cool, I think in this case the OP is being
> a router, not server?

This mechanism is generic. EDT has not been designed for servers only.

One HTB class (with one associated qdisc per leaf) per rate limiter
does not scale, and consumes a _lot_ more memory.

We have abandoned HTB at Google for these reasons.

Nice thing with EDT is that you can stack arbitrary number of rate limiters,
and still keep a single queue (in FQ or another layer downstream)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-27 21:09                   ` Eric Dumazet
@ 2019-08-27 21:41                     ` Dave Taht
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Taht @ 2019-08-27 21:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Akshat Kakkar, Anton Danilov, NetFilter, lartc, netdev, bloat

On Tue, Aug 27, 2019 at 2:09 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 8/27/19 10:53 PM, Dave Taht wrote:
> >
> > Although this is very cool, I think in this case the OP is being
> > a router, not server?
>
> This mechanism is generic. EDT has not been designed for servers only.
>
> One HTB class (with one associated qdisc per leaf) per rate limiter
> does not scale, and consumes a _lot_ more memory.
>
> We have abandoned HTB at Google for these reasons.
>
> Nice thing with EDT is that you can stack arbitrary number of rate limiters,
> and still keep a single queue (in FQ or another layer downstream)

There's a lot of nice things about EDT! I'd followed along on the
theory, timerwheels, virtual clocks, etc, and went
seeking ethernet hw that could do it (directly) on the low end and
came up empty - and doing anything with the concept required a
complete rethink on everything we were already doing in
wifi/fq_codel/cake ;(, and after we shipped cake in 4.19, I bought a
sailboat, and logged out for a while.

The biggest problem bufferbloat.net has left is more efficient inbound
shaping/policing on cheap hw.

I don't suppose you've solved that already? :puppy dog eyes:

Next year's version of openwrt we can maybe try to do something
coherent with EDT.

>


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unable to create htb tc classes more than 64K
  2019-08-26  6:32               ` Eric Dumazet
  2019-08-26  7:28                 ` Toke Høiland-Jørgensen
  2019-08-27 20:53                 ` Dave Taht
@ 2020-01-10 12:38                 ` Akshat Kakkar
  2 siblings, 0 replies; 16+ messages in thread
From: Akshat Kakkar @ 2020-01-10 12:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, Anton Danilov, NetFilter, lartc, netdev

Hi Eric,

Thanks for a detailed reply. Sorry I couldn't reply as I was
completely bed ridden.

In order for me to try this, I require few inputs (as I am new to all this)...

1. How do I register in Kernel, that my eBPF program should be called? Is this
https://netdevconf.info/1.1/proceedings/papers/On-getting-tc-classifier-fully-programmable-with-cls-bpf.pdf
and
http://man7.org/linux/man-pages/man8/tc-bpf.8.html
correct documents ?
2. Some info with respect to EDT and skb->tstamp and how things work.

On Mon, Aug 26, 2019 at 12:02 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 8/25/19 7:52 PM, Cong Wang wrote:
> > On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
> >>
> >> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >>>> I am using ipset +  iptables to classify and not filters. Besides, if
> >>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes
> >>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
> >>>> then how can those lowest child classes be actually used or consumed?
> >>>
> >>> Just install tc filters on the lower level too.
> >>
> >> If I understand correctly, you are saying,
> >> instead of :
> >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> >> 0x00000001 fw flowid 1:10
> >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> >> 0x00000002 fw flowid 1:20
> >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> >> 0x00000003 fw flowid 2:10
> >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> >> 0x00000004 fw flowid 2:20
> >>
> >>
> >> I should do this: (i.e. changing parent to just immediate qdisc)
> >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
> >> fw flowid 1:10
> >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
> >> fw flowid 1:20
> >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
> >> fw flowid 2:10
> >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
> >> fw flowid 2:20
> >
> >
> > Yes, this is what I meant.
> >
> >
> >>
> >> I tried this previously. But there is not change in the result.
> >> Behaviour is exactly same, i.e. I am still getting 100Mbps and not
> >> 100kbps or 300kbps
> >>
> >> Besides, as I mentioned previously I am using ipset + skbprio and not
> >> filters stuff. Filters I used just to test.
> >>
> >> ipset  -N foo hash:ip,mark skbinfo
> >>
> >> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
> >> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
> >> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
> >> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20
> >>
> >> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio
> >
> > Hmm..
> >
> > I am not familiar with ipset, but it seems to save the skbprio into
> > skb->priority, so it doesn't need TC filter to classify it again.
> >
> > I guess your packets might go to the direct queue of HTB, which
> > bypasses the token bucket. Can you dump the stats and check?
>
> With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and
> an eBPF program using EDT model (Earliest Departure Time)
>
> The BPF program would perform the classification, then find a data structure
> based on the 'class', and then update/maintain class virtual times and skb->tstamp
>
> TBF = bpf_map_lookup_elem(&map, &classid);
>
> uint64_t now = bpf_ktime_get_ns();
> uint64_t time_to_send = max(TBF->time_to_send, now);
>
> time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate;
> if (time_to_send > TBF->max_horizon) {
>     return TC_ACT_SHOT;
> }
> TBF->time_to_send = time_to_send;
> skb->tstamp = max(time_to_send, skb->tstamp);
> if (time_to_send - now > TBF->ecn_horizon)
>     bpf_skb_ecn_set_ce(skb);
> return TC_ACT_OK;
>
> tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar.
>
>
> [1]  MQ + FQ if the device is multi-queues.
>
>    Note that this setup scales very well on SMP, since we no longer are forced
>  to use a single HTB hierarchy (protected by a single spinlock)
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-01-10 12:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 12:48 Unable to create htb tc classes more than 64K Akshat Kakkar
2019-08-16 17:45 ` Cong Wang
2019-08-17 12:46   ` Akshat Kakkar
2019-08-17 18:24     ` Cong Wang
2019-08-17 19:04       ` Akshat Kakkar
2019-08-20  6:26         ` Akshat Kakkar
2019-08-21 22:06         ` Cong Wang
2019-08-22  5:59           ` Akshat Kakkar
2019-08-25 17:52             ` Cong Wang
2019-08-26  6:32               ` Eric Dumazet
2019-08-26  7:28                 ` Toke Høiland-Jørgensen
2019-08-27 20:53                 ` Dave Taht
2019-08-27 21:09                   ` Eric Dumazet
2019-08-27 21:41                     ` Dave Taht
2020-01-10 12:38                 ` Akshat Kakkar
2019-08-26 16:45         ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).