* Unable to create htb tc classes more than 64K @ 2019-08-16 12:48 Akshat Kakkar 2019-08-16 17:45 ` Cong Wang 0 siblings, 1 reply; 16+ messages in thread From: Akshat Kakkar @ 2019-08-16 12:48 UTC (permalink / raw) To: netfilter-devel, lartc, netdev I want to have around 1 Million htb tc classes. The simple structure of htb tc class, allow having only 64K classes at once. But, it is possible to make it more hierarchical using hierarchy of qdisc and classes. For this I tried something like this tc qdisc add dev eno2 root handle 100: htb tc class add dev eno2 parent 100: classid 100:1 htb rate 100Mbps tc class add dev eno2 parent 100: classid 100:2 htb rate 100Mbps tc qdisc add dev eno2 parent 100:1 handle 1: htb tc class add dev eno2 parent 1: classid 1:10 htb rate 100kbps tc class add dev eno2 parent 1: classid 1:20 htb rate 300kbps tc qdisc add dev eno2 parent 100:2 handle 2: htb tc class add dev eno2 parent 2: classid 2:10 htb rate 100kbps tc class add dev eno2 parent 2: classid 2:20 htb rate 300kbps What I want is something like: tc filter add dev eno2 parent 100: protocol ip prio 1 handle 0x00000001 fw flowid 1:10 tc filter add dev eno2 parent 100: protocol ip prio 1 handle 0x00000002 fw flowid 1:20 tc filter add dev eno2 parent 100: protocol ip prio 1 handle 0x00000003 fw flowid 2:10 tc filter add dev eno2 parent 100: protocol ip prio 1 handle 0x00000004 fw flowid 2:20 But I am unable to shape my traffic by any of 1:10, 1:20, 2:10 or 2:20. Can you please suggest, where is it going wrong? Is it not possible altogether? -Akshat ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-16 12:48 Unable to create htb tc classes more than 64K Akshat Kakkar @ 2019-08-16 17:45 ` Cong Wang 2019-08-17 12:46 ` Akshat Kakkar 0 siblings, 1 reply; 16+ messages in thread From: Cong Wang @ 2019-08-16 17:45 UTC (permalink / raw) To: Akshat Kakkar; +Cc: NetFilter, lartc, netdev On Fri, Aug 16, 2019 at 5:49 AM Akshat Kakkar <akshat.1984@gmail.com> wrote: > > I want to have around 1 Million htb tc classes. > The simple structure of htb tc class, allow having only 64K classes at once. This is probably due the limit of class ID which is 16bit for minor. > But, it is possible to make it more hierarchical using hierarchy of > qdisc and classes. > For this I tried something like this > > tc qdisc add dev eno2 root handle 100: htb > tc class add dev eno2 parent 100: classid 100:1 htb rate 100Mbps > tc class add dev eno2 parent 100: classid 100:2 htb rate 100Mbps > > tc qdisc add dev eno2 parent 100:1 handle 1: htb > tc class add dev eno2 parent 1: classid 1:10 htb rate 100kbps > tc class add dev eno2 parent 1: classid 1:20 htb rate 300kbps > > tc qdisc add dev eno2 parent 100:2 handle 2: htb > tc class add dev eno2 parent 2: classid 2:10 htb rate 100kbps > tc class add dev eno2 parent 2: classid 2:20 htb rate 300kbps > > What I want is something like: > tc filter add dev eno2 parent 100: protocol ip prio 1 handle > 0x00000001 fw flowid 1:10 > tc filter add dev eno2 parent 100: protocol ip prio 1 handle > 0x00000002 fw flowid 1:20 > tc filter add dev eno2 parent 100: protocol ip prio 1 handle > 0x00000003 fw flowid 2:10 > tc filter add dev eno2 parent 100: protocol ip prio 1 handle > 0x00000004 fw flowid 2:20 > > But I am unable to shape my traffic by any of 1:10, 1:20, 2:10 or 2:20. > > Can you please suggest, where is it going wrong? > Is it not possible altogether? The filter could only filter for classes on the same level, you are trying to filter for the children classes, which doesn't work. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-16 17:45 ` Cong Wang @ 2019-08-17 12:46 ` Akshat Kakkar 2019-08-17 18:24 ` Cong Wang 0 siblings, 1 reply; 16+ messages in thread From: Akshat Kakkar @ 2019-08-17 12:46 UTC (permalink / raw) To: Cong Wang; +Cc: NetFilter, lartc, netdev I agree that it is because of 16bit of minor I'd of class which restricts it to 64K. Point is, can we use multilevel qdisc and classes to extend it to more no. of classes i.e. to more than 64K classes One scheme can be like 100: root qdisc | / | \ / | \ / | \ / | \ 100:1 100:2 100:3 child classes | | | | | | | | | 1: 2: 3: qdisc / \ / \ / \ / \ / \ 1:1 1:2 3:1 3:2 leaf classes with all qdisc and classes defined as htb. Is this correct approach? Any alternative?? Besides, in order to direct traffic to leaf classes 1:1, 1:2, 2:1, 2:2, 3:1, 3:2 .... , instead of using filters I am using ipset with skbprio and iptables map-set match rule. But even after all this it don't work. Why? What I am missing? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-17 12:46 ` Akshat Kakkar @ 2019-08-17 18:24 ` Cong Wang 2019-08-17 19:04 ` Akshat Kakkar 0 siblings, 1 reply; 16+ messages in thread From: Cong Wang @ 2019-08-17 18:24 UTC (permalink / raw) To: Akshat Kakkar; +Cc: NetFilter, lartc, netdev On Sat, Aug 17, 2019 at 5:46 AM Akshat Kakkar <akshat.1984@gmail.com> wrote: > > I agree that it is because of 16bit of minor I'd of class which > restricts it to 64K. > Point is, can we use multilevel qdisc and classes to extend it to more > no. of classes i.e. to more than 64K classes If your goal is merely having as many classes as you can, then yes. > > One scheme can be like > 100: root qdisc > | > / | \ > / | \ > / | \ > / | \ > 100:1 100:2 100:3 child classes > | | | > | | | > | | | > 1: 2: 3: qdisc > / \ / \ / \ > / \ / \ > 1:1 1:2 3:1 3:2 leaf classes > > with all qdisc and classes defined as htb. > > Is this correct approach? Any alternative?? Again, depends on what your goal is. > > Besides, in order to direct traffic to leaf classes 1:1, 1:2, 2:1, > 2:2, 3:1, 3:2 .... , instead of using filters I am using ipset with > skbprio and iptables map-set match rule. > But even after all this it don't work. Why? Again, the filters you use to classify the packets could only work for the classes on the same level, no the next level. Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-17 18:24 ` Cong Wang @ 2019-08-17 19:04 ` Akshat Kakkar 2019-08-20 6:26 ` Akshat Kakkar ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Akshat Kakkar @ 2019-08-17 19:04 UTC (permalink / raw) To: Cong Wang; +Cc: NetFilter, lartc, netdev On Sat, Aug 17, 2019 at 11:54 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Sat, Aug 17, 2019 at 5:46 AM Akshat Kakkar <akshat.1984@gmail.com> wrote: > > > > I agree that it is because of 16bit of minor I'd of class which > > restricts it to 64K. > > Point is, can we use multilevel qdisc and classes to extend it to more > > no. of classes i.e. to more than 64K classes > > If your goal is merely having as many classes as you can, then yes. My goal is not just to make as many classes as possible, but also to use them to do rate limiting per ip per server. Say, I have a list of 10000 IPs and more than 100 servers. So simply if I want few IPs to get speed of says 1Mbps per server but others say speed of 2 Mbps per server. How can I achieve this without having 10000 x 100 classes. These numbers can be large than this and hence I am looking for a generic solution to this. > > > > > > One scheme can be like > > 100: root qdisc > > | > > / | \ > > / | \ > > / | \ > > / | \ > > 100:1 100:2 100:3 child classes > > | | | > > | | | > > | | | > > 1: 2: 3: qdisc > > / \ / \ / \ > > / \ / \ > > 1:1 1:2 3:1 3:2 leaf classes > > > > with all qdisc and classes defined as htb. > > > > Is this correct approach? Any alternative?? > > Again, depends on what your goal is. > > > > > > Besides, in order to direct traffic to leaf classes 1:1, 1:2, 2:1, > > 2:2, 3:1, 3:2 .... , instead of using filters I am using ipset with > > skbprio and iptables map-set match rule. > > But even after all this it don't work. Why? > > Again, the filters you use to classify the packets could only > work for the classes on the same level, no the next level. I am using ipset + iptables to classify and not filters. Besides, if tc is allowing me to define qdisc -> classes -> qdsic -> classes (1,2,3 ...) sort of structure (ie like the one shown in ascii tree) then how can those lowest child classes be actually used or consumed? > > > Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-17 19:04 ` Akshat Kakkar @ 2019-08-20 6:26 ` Akshat Kakkar 2019-08-21 22:06 ` Cong Wang 2019-08-26 16:45 ` Jesper Dangaard Brouer 2 siblings, 0 replies; 16+ messages in thread From: Akshat Kakkar @ 2019-08-20 6:26 UTC (permalink / raw) To: Cong Wang, Anton Danilov; +Cc: NetFilter, lartc, netdev >> If your goal is merely having as many classes as you can, then yes. My goal is not just to make as many classes as possible, but also to use them to do rate limiting per ip per server. Say, I have a list of 10000 IPs and more than 100 servers. So simply if I want few IPs to get speed of says 1Mbps per server but others say speed of 2 Mbps per server. How can I achieve this without having 10000 x 100 classes. These numbers can be large than this and hence I am looking for a generic solution to this. I am using ipset + iptables to classify and not filters. Besides, if tc is allowing me to define qdisc (100:) -> classes (100:1) -> qdisc(1: 2: 3: ) -> classes (1:1,1:2 2:1,2:2 3:1, 3:2 ...) sort of structure (ie like the one shown in ascii tree) then how should those lowest child classes be actually used or consumed or where it can be used? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-17 19:04 ` Akshat Kakkar 2019-08-20 6:26 ` Akshat Kakkar @ 2019-08-21 22:06 ` Cong Wang 2019-08-22 5:59 ` Akshat Kakkar 2019-08-26 16:45 ` Jesper Dangaard Brouer 2 siblings, 1 reply; 16+ messages in thread From: Cong Wang @ 2019-08-21 22:06 UTC (permalink / raw) To: Akshat Kakkar; +Cc: NetFilter, lartc, netdev On Sat, Aug 17, 2019 at 12:04 PM Akshat Kakkar <akshat.1984@gmail.com> wrote: > I am using ipset + iptables to classify and not filters. Besides, if > tc is allowing me to define qdisc -> classes -> qdsic -> classes > (1,2,3 ...) sort of structure (ie like the one shown in ascii tree) > then how can those lowest child classes be actually used or consumed? Just install tc filters on the lower level too. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-21 22:06 ` Cong Wang @ 2019-08-22 5:59 ` Akshat Kakkar 2019-08-25 17:52 ` Cong Wang 0 siblings, 1 reply; 16+ messages in thread From: Akshat Kakkar @ 2019-08-22 5:59 UTC (permalink / raw) To: Cong Wang, Anton Danilov; +Cc: NetFilter, lartc, netdev On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > I am using ipset + iptables to classify and not filters. Besides, if > > tc is allowing me to define qdisc -> classes -> qdsic -> classes > > (1,2,3 ...) sort of structure (ie like the one shown in ascii tree) > > then how can those lowest child classes be actually used or consumed? > > Just install tc filters on the lower level too. If I understand correctly, you are saying, instead of : tc filter add dev eno2 parent 100: protocol ip prio 1 handle 0x00000001 fw flowid 1:10 tc filter add dev eno2 parent 100: protocol ip prio 1 handle 0x00000002 fw flowid 1:20 tc filter add dev eno2 parent 100: protocol ip prio 1 handle 0x00000003 fw flowid 2:10 tc filter add dev eno2 parent 100: protocol ip prio 1 handle 0x00000004 fw flowid 2:20 I should do this: (i.e. changing parent to just immediate qdisc) tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001 fw flowid 1:10 tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002 fw flowid 1:20 tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003 fw flowid 2:10 tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004 fw flowid 2:20 I tried this previously. But there is not change in the result. Behaviour is exactly same, i.e. I am still getting 100Mbps and not 100kbps or 300kbps Besides, as I mentioned previously I am using ipset + skbprio and not filters stuff. Filters I used just to test. ipset -N foo hash:ip,mark skbinfo ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10 ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20 ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10 ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20 iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio That's why I added @Anton Danilov in cc, so that he can have a look as he designed this skbprio thing in ipset and thus would be having a better idea. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-22 5:59 ` Akshat Kakkar @ 2019-08-25 17:52 ` Cong Wang 2019-08-26 6:32 ` Eric Dumazet 0 siblings, 1 reply; 16+ messages in thread From: Cong Wang @ 2019-08-25 17:52 UTC (permalink / raw) To: Akshat Kakkar; +Cc: Anton Danilov, NetFilter, lartc, netdev On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote: > > On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > I am using ipset + iptables to classify and not filters. Besides, if > > > tc is allowing me to define qdisc -> classes -> qdsic -> classes > > > (1,2,3 ...) sort of structure (ie like the one shown in ascii tree) > > > then how can those lowest child classes be actually used or consumed? > > > > Just install tc filters on the lower level too. > > If I understand correctly, you are saying, > instead of : > tc filter add dev eno2 parent 100: protocol ip prio 1 handle > 0x00000001 fw flowid 1:10 > tc filter add dev eno2 parent 100: protocol ip prio 1 handle > 0x00000002 fw flowid 1:20 > tc filter add dev eno2 parent 100: protocol ip prio 1 handle > 0x00000003 fw flowid 2:10 > tc filter add dev eno2 parent 100: protocol ip prio 1 handle > 0x00000004 fw flowid 2:20 > > > I should do this: (i.e. changing parent to just immediate qdisc) > tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001 > fw flowid 1:10 > tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002 > fw flowid 1:20 > tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003 > fw flowid 2:10 > tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004 > fw flowid 2:20 Yes, this is what I meant. > > I tried this previously. But there is not change in the result. > Behaviour is exactly same, i.e. I am still getting 100Mbps and not > 100kbps or 300kbps > > Besides, as I mentioned previously I am using ipset + skbprio and not > filters stuff. Filters I used just to test. > > ipset -N foo hash:ip,mark skbinfo > > ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10 > ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20 > ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10 > ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20 > > iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio Hmm.. I am not familiar with ipset, but it seems to save the skbprio into skb->priority, so it doesn't need TC filter to classify it again. I guess your packets might go to the direct queue of HTB, which bypasses the token bucket. Can you dump the stats and check? Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-25 17:52 ` Cong Wang @ 2019-08-26 6:32 ` Eric Dumazet 2019-08-26 7:28 ` Toke Høiland-Jørgensen ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Eric Dumazet @ 2019-08-26 6:32 UTC (permalink / raw) To: Cong Wang, Akshat Kakkar; +Cc: Anton Danilov, NetFilter, lartc, netdev On 8/25/19 7:52 PM, Cong Wang wrote: > On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote: >> >> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: >>>> I am using ipset + iptables to classify and not filters. Besides, if >>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes >>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree) >>>> then how can those lowest child classes be actually used or consumed? >>> >>> Just install tc filters on the lower level too. >> >> If I understand correctly, you are saying, >> instead of : >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle >> 0x00000001 fw flowid 1:10 >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle >> 0x00000002 fw flowid 1:20 >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle >> 0x00000003 fw flowid 2:10 >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle >> 0x00000004 fw flowid 2:20 >> >> >> I should do this: (i.e. changing parent to just immediate qdisc) >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001 >> fw flowid 1:10 >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002 >> fw flowid 1:20 >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003 >> fw flowid 2:10 >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004 >> fw flowid 2:20 > > > Yes, this is what I meant. > > >> >> I tried this previously. But there is not change in the result. >> Behaviour is exactly same, i.e. I am still getting 100Mbps and not >> 100kbps or 300kbps >> >> Besides, as I mentioned previously I am using ipset + skbprio and not >> filters stuff. Filters I used just to test. >> >> ipset -N foo hash:ip,mark skbinfo >> >> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10 >> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20 >> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10 >> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20 >> >> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio > > Hmm.. > > I am not familiar with ipset, but it seems to save the skbprio into > skb->priority, so it doesn't need TC filter to classify it again. > > I guess your packets might go to the direct queue of HTB, which > bypasses the token bucket. Can you dump the stats and check? With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and an eBPF program using EDT model (Earliest Departure Time) The BPF program would perform the classification, then find a data structure based on the 'class', and then update/maintain class virtual times and skb->tstamp TBF = bpf_map_lookup_elem(&map, &classid); uint64_t now = bpf_ktime_get_ns(); uint64_t time_to_send = max(TBF->time_to_send, now); time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate; if (time_to_send > TBF->max_horizon) { return TC_ACT_SHOT; } TBF->time_to_send = time_to_send; skb->tstamp = max(time_to_send, skb->tstamp); if (time_to_send - now > TBF->ecn_horizon) bpf_skb_ecn_set_ce(skb); return TC_ACT_OK; tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar. [1] MQ + FQ if the device is multi-queues. Note that this setup scales very well on SMP, since we no longer are forced to use a single HTB hierarchy (protected by a single spinlock) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-26 6:32 ` Eric Dumazet @ 2019-08-26 7:28 ` Toke Høiland-Jørgensen 2019-08-27 20:53 ` Dave Taht 2020-01-10 12:38 ` Akshat Kakkar 2 siblings, 0 replies; 16+ messages in thread From: Toke Høiland-Jørgensen @ 2019-08-26 7:28 UTC (permalink / raw) To: Eric Dumazet, Cong Wang, Akshat Kakkar Cc: Anton Danilov, NetFilter, lartc, netdev Eric Dumazet <eric.dumazet@gmail.com> writes: > On 8/25/19 7:52 PM, Cong Wang wrote: >> On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote: >>> >>> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: >>>>> I am using ipset + iptables to classify and not filters. Besides, if >>>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes >>>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree) >>>>> then how can those lowest child classes be actually used or consumed? >>>> >>>> Just install tc filters on the lower level too. >>> >>> If I understand correctly, you are saying, >>> instead of : >>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle >>> 0x00000001 fw flowid 1:10 >>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle >>> 0x00000002 fw flowid 1:20 >>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle >>> 0x00000003 fw flowid 2:10 >>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle >>> 0x00000004 fw flowid 2:20 >>> >>> >>> I should do this: (i.e. changing parent to just immediate qdisc) >>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001 >>> fw flowid 1:10 >>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002 >>> fw flowid 1:20 >>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003 >>> fw flowid 2:10 >>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004 >>> fw flowid 2:20 >> >> >> Yes, this is what I meant. >> >> >>> >>> I tried this previously. But there is not change in the result. >>> Behaviour is exactly same, i.e. I am still getting 100Mbps and not >>> 100kbps or 300kbps >>> >>> Besides, as I mentioned previously I am using ipset + skbprio and not >>> filters stuff. Filters I used just to test. >>> >>> ipset -N foo hash:ip,mark skbinfo >>> >>> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10 >>> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20 >>> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10 >>> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20 >>> >>> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio >> >> Hmm.. >> >> I am not familiar with ipset, but it seems to save the skbprio into >> skb->priority, so it doesn't need TC filter to classify it again. >> >> I guess your packets might go to the direct queue of HTB, which >> bypasses the token bucket. Can you dump the stats and check? > > With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and > an eBPF program using EDT model (Earliest Departure Time) > > The BPF program would perform the classification, then find a data structure > based on the 'class', and then update/maintain class virtual times and skb->tstamp > > TBF = bpf_map_lookup_elem(&map, &classid); > > uint64_t now = bpf_ktime_get_ns(); > uint64_t time_to_send = max(TBF->time_to_send, now); > > time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate; > if (time_to_send > TBF->max_horizon) { > return TC_ACT_SHOT; > } > TBF->time_to_send = time_to_send; > skb->tstamp = max(time_to_send, skb->tstamp); > if (time_to_send - now > TBF->ecn_horizon) > bpf_skb_ecn_set_ce(skb); > return TC_ACT_OK; > > tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar. > > > [1] MQ + FQ if the device is multi-queues. > > Note that this setup scales very well on SMP, since we no longer are forced > to use a single HTB hierarchy (protected by a single spinlock) Wow, this is very cool! Thanks for that walk-through, Eric :) -Toke ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-26 6:32 ` Eric Dumazet 2019-08-26 7:28 ` Toke Høiland-Jørgensen @ 2019-08-27 20:53 ` Dave Taht 2019-08-27 21:09 ` Eric Dumazet 2020-01-10 12:38 ` Akshat Kakkar 2 siblings, 1 reply; 16+ messages in thread From: Dave Taht @ 2019-08-27 20:53 UTC (permalink / raw) To: Eric Dumazet Cc: Cong Wang, Akshat Kakkar, Anton Danilov, NetFilter, lartc, netdev On Sun, Aug 25, 2019 at 11:47 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 8/25/19 7:52 PM, Cong Wang wrote: > > On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote: > >> > >> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: > >>>> I am using ipset + iptables to classify and not filters. Besides, if > >>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes > >>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree) > >>>> then how can those lowest child classes be actually used or consumed? > >>> > >>> Just install tc filters on the lower level too. > >> > >> If I understand correctly, you are saying, > >> instead of : > >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle > >> 0x00000001 fw flowid 1:10 > >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle > >> 0x00000002 fw flowid 1:20 > >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle > >> 0x00000003 fw flowid 2:10 > >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle > >> 0x00000004 fw flowid 2:20 > >> > >> > >> I should do this: (i.e. changing parent to just immediate qdisc) > >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001 > >> fw flowid 1:10 > >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002 > >> fw flowid 1:20 > >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003 > >> fw flowid 2:10 > >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004 > >> fw flowid 2:20 > > > > > > Yes, this is what I meant. > > > > > >> > >> I tried this previously. But there is not change in the result. > >> Behaviour is exactly same, i.e. I am still getting 100Mbps and not > >> 100kbps or 300kbps > >> > >> Besides, as I mentioned previously I am using ipset + skbprio and not > >> filters stuff. Filters I used just to test. > >> > >> ipset -N foo hash:ip,mark skbinfo > >> > >> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10 > >> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20 > >> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10 > >> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20 > >> > >> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio > > > > Hmm.. > > > > I am not familiar with ipset, but it seems to save the skbprio into > > skb->priority, so it doesn't need TC filter to classify it again. > > > > I guess your packets might go to the direct queue of HTB, which > > bypasses the token bucket. Can you dump the stats and check? > > With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and > an eBPF program using EDT model (Earliest Departure Time) Although this is very cool, I think in this case the OP is being a router, not server? > The BPF program would perform the classification, then find a data structure > based on the 'class', and then update/maintain class virtual times and skb->tstamp > > TBF = bpf_map_lookup_elem(&map, &classid); > > uint64_t now = bpf_ktime_get_ns(); > uint64_t time_to_send = max(TBF->time_to_send, now); > > time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate; > if (time_to_send > TBF->max_horizon) { > return TC_ACT_SHOT; > } > TBF->time_to_send = time_to_send; > skb->tstamp = max(time_to_send, skb->tstamp); > if (time_to_send - now > TBF->ecn_horizon) > bpf_skb_ecn_set_ce(skb); > return TC_ACT_OK; > > tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar. > > > [1] MQ + FQ if the device is multi-queues. > > Note that this setup scales very well on SMP, since we no longer are forced > to use a single HTB hierarchy (protected by a single spinlock) > -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-27 20:53 ` Dave Taht @ 2019-08-27 21:09 ` Eric Dumazet 2019-08-27 21:41 ` Dave Taht 0 siblings, 1 reply; 16+ messages in thread From: Eric Dumazet @ 2019-08-27 21:09 UTC (permalink / raw) To: Dave Taht, Eric Dumazet Cc: Cong Wang, Akshat Kakkar, Anton Danilov, NetFilter, lartc, netdev On 8/27/19 10:53 PM, Dave Taht wrote: > > Although this is very cool, I think in this case the OP is being > a router, not server? This mechanism is generic. EDT has not been designed for servers only. One HTB class (with one associated qdisc per leaf) per rate limiter does not scale, and consumes a _lot_ more memory. We have abandoned HTB at Google for these reasons. Nice thing with EDT is that you can stack arbitrary number of rate limiters, and still keep a single queue (in FQ or another layer downstream) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-27 21:09 ` Eric Dumazet @ 2019-08-27 21:41 ` Dave Taht 0 siblings, 0 replies; 16+ messages in thread From: Dave Taht @ 2019-08-27 21:41 UTC (permalink / raw) To: Eric Dumazet Cc: Cong Wang, Akshat Kakkar, Anton Danilov, NetFilter, lartc, netdev, bloat On Tue, Aug 27, 2019 at 2:09 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 8/27/19 10:53 PM, Dave Taht wrote: > > > > Although this is very cool, I think in this case the OP is being > > a router, not server? > > This mechanism is generic. EDT has not been designed for servers only. > > One HTB class (with one associated qdisc per leaf) per rate limiter > does not scale, and consumes a _lot_ more memory. > > We have abandoned HTB at Google for these reasons. > > Nice thing with EDT is that you can stack arbitrary number of rate limiters, > and still keep a single queue (in FQ or another layer downstream) There's a lot of nice things about EDT! I'd followed along on the theory, timerwheels, virtual clocks, etc, and went seeking ethernet hw that could do it (directly) on the low end and came up empty - and doing anything with the concept required a complete rethink on everything we were already doing in wifi/fq_codel/cake ;(, and after we shipped cake in 4.19, I bought a sailboat, and logged out for a while. The biggest problem bufferbloat.net has left is more efficient inbound shaping/policing on cheap hw. I don't suppose you've solved that already? :puppy dog eyes: Next year's version of openwrt we can maybe try to do something coherent with EDT. > -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-26 6:32 ` Eric Dumazet 2019-08-26 7:28 ` Toke Høiland-Jørgensen 2019-08-27 20:53 ` Dave Taht @ 2020-01-10 12:38 ` Akshat Kakkar 2 siblings, 0 replies; 16+ messages in thread From: Akshat Kakkar @ 2020-01-10 12:38 UTC (permalink / raw) To: Eric Dumazet; +Cc: Cong Wang, Anton Danilov, NetFilter, lartc, netdev Hi Eric, Thanks for a detailed reply. Sorry I couldn't reply as I was completely bed ridden. In order for me to try this, I require few inputs (as I am new to all this)... 1. How do I register in Kernel, that my eBPF program should be called? Is this https://netdevconf.info/1.1/proceedings/papers/On-getting-tc-classifier-fully-programmable-with-cls-bpf.pdf and http://man7.org/linux/man-pages/man8/tc-bpf.8.html correct documents ? 2. Some info with respect to EDT and skb->tstamp and how things work. On Mon, Aug 26, 2019 at 12:02 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 8/25/19 7:52 PM, Cong Wang wrote: > > On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote: > >> > >> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: > >>>> I am using ipset + iptables to classify and not filters. Besides, if > >>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes > >>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree) > >>>> then how can those lowest child classes be actually used or consumed? > >>> > >>> Just install tc filters on the lower level too. > >> > >> If I understand correctly, you are saying, > >> instead of : > >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle > >> 0x00000001 fw flowid 1:10 > >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle > >> 0x00000002 fw flowid 1:20 > >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle > >> 0x00000003 fw flowid 2:10 > >> tc filter add dev eno2 parent 100: protocol ip prio 1 handle > >> 0x00000004 fw flowid 2:20 > >> > >> > >> I should do this: (i.e. changing parent to just immediate qdisc) > >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001 > >> fw flowid 1:10 > >> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002 > >> fw flowid 1:20 > >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003 > >> fw flowid 2:10 > >> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004 > >> fw flowid 2:20 > > > > > > Yes, this is what I meant. > > > > > >> > >> I tried this previously. But there is not change in the result. > >> Behaviour is exactly same, i.e. I am still getting 100Mbps and not > >> 100kbps or 300kbps > >> > >> Besides, as I mentioned previously I am using ipset + skbprio and not > >> filters stuff. Filters I used just to test. > >> > >> ipset -N foo hash:ip,mark skbinfo > >> > >> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10 > >> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20 > >> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10 > >> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20 > >> > >> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio > > > > Hmm.. > > > > I am not familiar with ipset, but it seems to save the skbprio into > > skb->priority, so it doesn't need TC filter to classify it again. > > > > I guess your packets might go to the direct queue of HTB, which > > bypasses the token bucket. Can you dump the stats and check? > > With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and > an eBPF program using EDT model (Earliest Departure Time) > > The BPF program would perform the classification, then find a data structure > based on the 'class', and then update/maintain class virtual times and skb->tstamp > > TBF = bpf_map_lookup_elem(&map, &classid); > > uint64_t now = bpf_ktime_get_ns(); > uint64_t time_to_send = max(TBF->time_to_send, now); > > time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate; > if (time_to_send > TBF->max_horizon) { > return TC_ACT_SHOT; > } > TBF->time_to_send = time_to_send; > skb->tstamp = max(time_to_send, skb->tstamp); > if (time_to_send - now > TBF->ecn_horizon) > bpf_skb_ecn_set_ce(skb); > return TC_ACT_OK; > > tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar. > > > [1] MQ + FQ if the device is multi-queues. > > Note that this setup scales very well on SMP, since we no longer are forced > to use a single HTB hierarchy (protected by a single spinlock) > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Unable to create htb tc classes more than 64K 2019-08-17 19:04 ` Akshat Kakkar 2019-08-20 6:26 ` Akshat Kakkar 2019-08-21 22:06 ` Cong Wang @ 2019-08-26 16:45 ` Jesper Dangaard Brouer 2 siblings, 0 replies; 16+ messages in thread From: Jesper Dangaard Brouer @ 2019-08-26 16:45 UTC (permalink / raw) To: Akshat Kakkar Cc: brouer, Cong Wang, NetFilter, lartc, netdev, Eric Dumazet, Toke Høiland-Jørgensen, Anton Danilov On Sun, 18 Aug 2019 00:34:33 +0530 Akshat Kakkar <akshat.1984@gmail.com> wrote: > My goal is not just to make as many classes as possible, but also to > use them to do rate limiting per ip per server. Say, I have a list of > 10000 IPs and more than 100 servers. So simply if I want few IPs to > get speed of says 1Mbps per server but others say speed of 2 Mbps per > server. How can I achieve this without having 10000 x 100 classes. > These numbers can be large than this and hence I am looking for a > generic solution to this. As Eric Dumazet also points out indirectly, you will be creating a huge bottleneck for SMP/multi-core CPUs. As your HTB root qdisc is a serialization point for all egress traffic, that all CPUs will need to take a lock on. It sounds like your use-case is not global rate limiting, but instead the goal is to rate limit customers or services (to something significantly lower than NIC link speed). To get scalability, in this case, you can instead use the MQ qdisc (as Eric also points out). I have an example script here[1], that shows how to setup MQ as root qdisc and add HTB leafs based on how many TX-queue the interface have via /sys/class/net/$DEV/queues/tx-*/ [1] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/bin/tc_mq_htb_setup_example.sh You are not done, yet. For solving the TX-queue locking congestion, the traffic needs to be redirected to the appropriate/correct TX CPUs. This can either be done with RSS (Receive Side Scaling) HW ethtool adjustment (reduce hash to IPs L3 only), or RPS (Receive Packet Steering), or with XDP cpumap redirect. The XDP cpumap redirect feature is implemented with XDP+TC BPF code here[2]. Notice, that XPS can screw with this so there is a XPS disable script here[3]. [2] https://github.com/xdp-project/xdp-cpumap-tc [3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/bin/xps_setup.sh -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-01-10 12:38 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-16 12:48 Unable to create htb tc classes more than 64K Akshat Kakkar 2019-08-16 17:45 ` Cong Wang 2019-08-17 12:46 ` Akshat Kakkar 2019-08-17 18:24 ` Cong Wang 2019-08-17 19:04 ` Akshat Kakkar 2019-08-20 6:26 ` Akshat Kakkar 2019-08-21 22:06 ` Cong Wang 2019-08-22 5:59 ` Akshat Kakkar 2019-08-25 17:52 ` Cong Wang 2019-08-26 6:32 ` Eric Dumazet 2019-08-26 7:28 ` Toke Høiland-Jørgensen 2019-08-27 20:53 ` Dave Taht 2019-08-27 21:09 ` Eric Dumazet 2019-08-27 21:41 ` Dave Taht 2020-01-10 12:38 ` Akshat Kakkar 2019-08-26 16:45 ` Jesper Dangaard Brouer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).