* Why do we prefer skb->priority to tc filters? @ 2015-03-11 17:34 Cong Wang 2015-03-11 18:08 ` Cong Wang 0 siblings, 1 reply; 13+ messages in thread From: Cong Wang @ 2015-03-11 17:34 UTC (permalink / raw) To: netdev; +Cc: Jamal Hadi Salim, David Miller Hi, Not sure about classful Qdisc's, for classless ones like fq_codel, we also prefer skb->priority value over tc filters: if (TC_H_MAJ(skb->priority) == sch->handle && TC_H_MIN(skb->priority) > 0 && TC_H_MIN(skb->priority) <= q->flows_cnt) return TC_H_MIN(skb->priority); filter = rcu_dereference_bh(q->filter_list); if (!filter) return fq_codel_hash(q, skb) + 1; *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; result = tc_classify(skb, filter, &res); Given that skb->priority can be specified in user-space, doesn't this mean some application can always override our rules specified by tc filters? I think we should always respect tc filters over any application setting. For discussion, I mean something like below makes more sense for me: struct tcf_result res; int result; - if (TC_H_MAJ(skb->priority) == sch->handle && - TC_H_MIN(skb->priority) > 0 && - TC_H_MIN(skb->priority) <= q->flows_cnt) - return TC_H_MIN(skb->priority); - filter = rcu_dereference_bh(q->filter_list); - if (!filter) + if (!filter) { + if (TC_H_MAJ(skb->priority) == sch->handle && + TC_H_MIN(skb->priority) > 0 && + TC_H_MIN(skb->priority) <= q->flows_cnt) + return TC_H_MIN(skb->priority); + return fq_codel_hash(q, skb) + 1; + } *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; result = tc_classify(skb, filter, &res); What do you think? Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 17:34 Why do we prefer skb->priority to tc filters? Cong Wang @ 2015-03-11 18:08 ` Cong Wang 2015-03-11 18:25 ` Eric Dumazet 0 siblings, 1 reply; 13+ messages in thread From: Cong Wang @ 2015-03-11 18:08 UTC (permalink / raw) To: netdev; +Cc: Jamal Hadi Salim, David Miller On Wed, Mar 11, 2015 at 10:34 AM, Cong Wang <cwang@twopensource.com> wrote: > > Given that skb->priority can be specified in user-space, doesn't this > mean some application can always override our rules specified by tc > filters? I think we should always respect tc filters over any > application setting. > Hmm, on the other hand, skb->priority can be changed in cgroup too, in this case the current behavior seems correct. :-/ Interesting. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 18:08 ` Cong Wang @ 2015-03-11 18:25 ` Eric Dumazet 2015-03-11 19:18 ` Cong Wang 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2015-03-11 18:25 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, David Miller On Wed, 2015-03-11 at 11:08 -0700, Cong Wang wrote: > On Wed, Mar 11, 2015 at 10:34 AM, Cong Wang <cwang@twopensource.com> wrote: > > > > Given that skb->priority can be specified in user-space, doesn't this > > mean some application can always override our rules specified by tc > > filters? I think we should always respect tc filters over any > > application setting. > > > > Hmm, on the other hand, skb->priority can be changed in cgroup > too, in this case the current behavior seems correct. :-/ Interesting. Note that even HTB has the following code : static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) { struct htb_sched *q = qdisc_priv(sch); struct htb_class *cl; struct tcf_result res; struct tcf_proto *tcf; int result; /* allow to select class by setting skb->priority to valid classid; * note that nfmark can be used too by attaching filter fw with no * rules in it */ if (skb->priority == sch->handle) return HTB_DIRECT; /* X:0 (direct flow) selected */ cl = htb_find(skb->priority, sch); if (cl) { if (cl->level == 0) return cl; /* Start with inner filter chain if a non-leaf class is selected */ tcf = rcu_dereference_bh(cl->filter_list); } else { tcf = rcu_dereference_bh(q->filter_list); } So if skb->priority happens to match sch->handle or any class handle, we queue packet into a queue, without calling tc_classify() ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 18:25 ` Eric Dumazet @ 2015-03-11 19:18 ` Cong Wang 2015-03-11 20:09 ` Eric Dumazet 0 siblings, 1 reply; 13+ messages in thread From: Cong Wang @ 2015-03-11 19:18 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller On Wed, Mar 11, 2015 at 11:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2015-03-11 at 11:08 -0700, Cong Wang wrote: >> On Wed, Mar 11, 2015 at 10:34 AM, Cong Wang <cwang@twopensource.com> wrote: >> > >> > Given that skb->priority can be specified in user-space, doesn't this >> > mean some application can always override our rules specified by tc >> > filters? I think we should always respect tc filters over any >> > application setting. >> > >> >> Hmm, on the other hand, skb->priority can be changed in cgroup >> too, in this case the current behavior seems correct. :-/ Interesting. > > Note that even HTB has the following code : > I am not surprised. [...] > > So if skb->priority happens to match sch->handle or any class handle, > we queue packet into a queue, without calling tc_classify() This is exactly why I am asking, kernel respects user's choice so much that it could override even both cgroup prio and tc filters, I don't know what is the reason behind, but this means our tc filters can not work with an application which sets skb->priority. For example, our tc filters classify the packets from this socket into flow 1:1, but application sets 1:2 by itself... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 19:18 ` Cong Wang @ 2015-03-11 20:09 ` Eric Dumazet 2015-03-11 20:46 ` Cong Wang 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2015-03-11 20:09 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, David Miller On Wed, 2015-03-11 at 12:18 -0700, Cong Wang wrote: > This is exactly why I am asking, kernel respects user's choice so much > that it could override even both cgroup prio and tc filters, I don't know > what is the reason behind, but this means our tc filters can not work > with an application which sets skb->priority. For example, our tc filters > classify the packets from this socket into flow 1:1, but application > sets 1:2 by itself... Application cannot set whatever skb->priority, unless it really knows better than kernel. case SO_PRIORITY: if ((val >= 0 && val <= 6) || ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) sk->sk_priority = val; else ret = -EPERM; break; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 20:09 ` Eric Dumazet @ 2015-03-11 20:46 ` Cong Wang 2015-03-11 21:47 ` Eric Dumazet 0 siblings, 1 reply; 13+ messages in thread From: Cong Wang @ 2015-03-11 20:46 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller On Wed, Mar 11, 2015 at 1:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2015-03-11 at 12:18 -0700, Cong Wang wrote: >> This is exactly why I am asking, kernel respects user's choice so much >> that it could override even both cgroup prio and tc filters, I don't know >> what is the reason behind, but this means our tc filters can not work >> with an application which sets skb->priority. For example, our tc filters >> classify the packets from this socket into flow 1:1, but application >> sets 1:2 by itself... > > Application cannot set whatever skb->priority, unless it really knows > better than kernel. > > case SO_PRIORITY: > if ((val >= 0 && val <= 6) || > ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) > sk->sk_priority = val; > else > ret = -EPERM; > break; > That is just a permission check when val > 6, given the fact most daemons have root permission, I doubt your argument makes a difference for discussion. At least with userns having root permission is more common. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 20:46 ` Cong Wang @ 2015-03-11 21:47 ` Eric Dumazet 2015-03-11 22:12 ` Cong Wang 2015-03-12 7:53 ` Dmitry Sytchev 0 siblings, 2 replies; 13+ messages in thread From: Eric Dumazet @ 2015-03-11 21:47 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, David Miller On Wed, 2015-03-11 at 13:46 -0700, Cong Wang wrote: > That is just a permission check when val > 6, given the fact most > daemons have root permission, I doubt your argument makes a difference > for discussion. At least with userns having root permission is more common. Some setups use ip[6]tables rules to mangle skb->priority to select a HTB class. Google definitely uses this model, as netfilter code runs on multiple cpus, while HTB classifier runs under qdisc spinlock, so far. If you believe root user should not set skb->priority to arbitrary values, this is a very different concern. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 21:47 ` Eric Dumazet @ 2015-03-11 22:12 ` Cong Wang 2015-03-12 0:00 ` Eric Dumazet 2015-03-12 7:53 ` Dmitry Sytchev 1 sibling, 1 reply; 13+ messages in thread From: Cong Wang @ 2015-03-11 22:12 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller On Wed, Mar 11, 2015 at 2:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2015-03-11 at 13:46 -0700, Cong Wang wrote: > >> That is just a permission check when val > 6, given the fact most >> daemons have root permission, I doubt your argument makes a difference >> for discussion. At least with userns having root permission is more common. > > Some setups use ip[6]tables rules to mangle skb->priority to select a > HTB class. > > Google definitely uses this model, as netfilter code runs on multiple > cpus, while HTB classifier runs under qdisc spinlock, so far. I knew we can modify skb->priority in a few ways, for example skbedit. That is not my concern, all what I am thinking is there is some way in application layer to bypass our tc filters, which is not expected to happen for me. Given our specific case, I want to propose to clear skb->priority after moving out of a netns: diff --git a/net/core/dev.c b/net/core/dev.c index 962ee9d..2301f01 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1694,6 +1694,7 @@ int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb) } skb_scrub_packet(skb, true); + skb->priority = 0; skb->protocol = eth_type_trans(skb, dev); skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 22:12 ` Cong Wang @ 2015-03-12 0:00 ` Eric Dumazet 2015-03-12 16:59 ` Cong Wang 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2015-03-12 0:00 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, David Miller On Wed, 2015-03-11 at 15:12 -0700, Cong Wang wrote: > I knew we can modify skb->priority in a few ways, for example skbedit. > > That is not my concern, all what I am thinking is there is some > way in application layer to bypass our tc filters, which is not expected > to happen for me. Given our specific case, I want to propose to clear > skb->priority after moving out of a netns: > > diff --git a/net/core/dev.c b/net/core/dev.c > index 962ee9d..2301f01 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1694,6 +1694,7 @@ int __dev_forward_skb(struct net_device *dev, > struct sk_buff *skb) > } > > skb_scrub_packet(skb, true); > + skb->priority = 0; > skb->protocol = eth_type_trans(skb, dev); > skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN); This looks more reasonable than your prior change to fq_codel, sfq, and others. If you use mavclan or ipvlan, not all paths will use __dev_forward_skb() So I am guessing you use veth maybe. Who knows. Note that you could instead use : skb->priority = skb->priority & TC_PRIO_MAX; This way, TC_PRIO_CONTROL legitimate traffic would not be downgraded to TC_PRIO_BESTEFFORT. This all looks like a policy decision, and we probably should not hard code it in the kernel : Some users have trusted applications running in containers. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-12 0:00 ` Eric Dumazet @ 2015-03-12 16:59 ` Cong Wang 2015-03-12 17:25 ` Cong Wang 0 siblings, 1 reply; 13+ messages in thread From: Cong Wang @ 2015-03-12 16:59 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller On Wed, Mar 11, 2015 at 5:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2015-03-11 at 15:12 -0700, Cong Wang wrote: > >> I knew we can modify skb->priority in a few ways, for example skbedit. >> >> That is not my concern, all what I am thinking is there is some >> way in application layer to bypass our tc filters, which is not expected >> to happen for me. Given our specific case, I want to propose to clear >> skb->priority after moving out of a netns: >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 962ee9d..2301f01 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -1694,6 +1694,7 @@ int __dev_forward_skb(struct net_device *dev, >> struct sk_buff *skb) >> } >> >> skb_scrub_packet(skb, true); >> + skb->priority = 0; >> skb->protocol = eth_type_trans(skb, dev); >> skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN); > > This looks more reasonable than your prior change to fq_codel, sfq, and > others. My previous change is only for discussion, never mean to send out. > > If you use mavclan or ipvlan, not all paths will use __dev_forward_skb() > That is intentional, because I am not sure about stacked device like macvlan or tunnel, it might make some sense to keep that value. > So I am guessing you use veth maybe. Who knows. > Exactly, otherwise I would probably just change skb_scrub_packet(). ;) > Note that you could instead use : > > skb->priority = skb->priority & TC_PRIO_MAX; > > This way, TC_PRIO_CONTROL legitimate traffic would not be downgraded to > TC_PRIO_BESTEFFORT. > > This all looks like a policy decision, and we probably should not hard > code it in the kernel : Some users have trusted applications running in > containers. > For veth case, I think we don't need to keep TC_PRIO_* either, because the packet will normally go to Rx path after that, where these values make no sense. (For my own case, we redirect it to eth0, but still it goes out of its original netns). Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-12 16:59 ` Cong Wang @ 2015-03-12 17:25 ` Cong Wang 0 siblings, 0 replies; 13+ messages in thread From: Cong Wang @ 2015-03-12 17:25 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller On Thu, Mar 12, 2015 at 9:59 AM, Cong Wang <cwang@twopensource.com> wrote: > On Wed, Mar 11, 2015 at 5:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >> If you use mavclan or ipvlan, not all paths will use __dev_forward_skb() >> > > That is intentional, because I am not sure about stacked device > like macvlan or tunnel, it might make some sense to keep that value. > >> So I am guessing you use veth maybe. Who knows. >> > > > Exactly, otherwise I would probably just change skb_scrub_packet(). ;) Ok, in some mode macvlan forwards the packets to Rx path too, interesting... ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-11 21:47 ` Eric Dumazet 2015-03-11 22:12 ` Cong Wang @ 2015-03-12 7:53 ` Dmitry Sytchev 2015-03-12 17:08 ` Cong Wang 1 sibling, 1 reply; 13+ messages in thread From: Dmitry Sytchev @ 2015-03-12 7:53 UTC (permalink / raw) To: netdev Sorry for slight offtopic, but for some reason, recent netfilter SET changes, which allow setting queue number and priority via ipset skbinfo extension, don't work with multiple HTB on top of multiq or mq. At the same time, tc filters set on outbound iface with skbedit works fine both for prio and queue number. How can I find the difference in their behaviour to trace where queue number set by ipset match gets lost? > Google definitely uses this model, as netfilter code runs on multiple > cpus, while HTB classifier runs under qdisc spinlock, so far. > > If you believe root user should not set skb->priority to arbitrary > values, this is a very different concern. -- Best regards, Dmitry Sytchev, IT Engineer ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why do we prefer skb->priority to tc filters? 2015-03-12 7:53 ` Dmitry Sytchev @ 2015-03-12 17:08 ` Cong Wang 0 siblings, 0 replies; 13+ messages in thread From: Cong Wang @ 2015-03-12 17:08 UTC (permalink / raw) To: Dmitry Sytchev; +Cc: netdev On Thu, Mar 12, 2015 at 12:53 AM, Dmitry Sytchev <kbdfck@gmail.com> wrote: > Sorry for slight offtopic, but for some reason, recent netfilter SET > changes, which allow setting queue number and priority via ipset > skbinfo extension, don't work with multiple HTB on top of multiq or > mq. Can you show us your setup? `iptables -L` and `tc qd show dev ...`, `tc class show dev ...` etc. > At the same time, tc filters set on outbound iface with skbedit works > fine both for prio and queue number. > How can I find the difference in their behaviour to trace where queue > number set by ipset match gets lost? There are fewer places to set skb->queue_mapping than skb->priority. Since you are using mq, you can watch the `tc -s -d class show dev ...` output to see if your packets go to the desired queue. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-03-12 17:25 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-03-11 17:34 Why do we prefer skb->priority to tc filters? Cong Wang 2015-03-11 18:08 ` Cong Wang 2015-03-11 18:25 ` Eric Dumazet 2015-03-11 19:18 ` Cong Wang 2015-03-11 20:09 ` Eric Dumazet 2015-03-11 20:46 ` Cong Wang 2015-03-11 21:47 ` Eric Dumazet 2015-03-11 22:12 ` Cong Wang 2015-03-12 0:00 ` Eric Dumazet 2015-03-12 16:59 ` Cong Wang 2015-03-12 17:25 ` Cong Wang 2015-03-12 7:53 ` Dmitry Sytchev 2015-03-12 17:08 ` Cong Wang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.