All of lore.kernel.org
 help / color / mirror / Atom feed
* Why do we prefer skb->priority to tc filters?
@ 2015-03-11 17:34 Cong Wang
  2015-03-11 18:08 ` Cong Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2015-03-11 17:34 UTC (permalink / raw)
  To: netdev; +Cc: Jamal Hadi Salim, David Miller

Hi,


Not sure about classful Qdisc's, for classless ones like fq_codel, we
also prefer skb->priority value over tc filters:

        if (TC_H_MAJ(skb->priority) == sch->handle &&
            TC_H_MIN(skb->priority) > 0 &&
            TC_H_MIN(skb->priority) <= q->flows_cnt)
                return TC_H_MIN(skb->priority);

        filter = rcu_dereference_bh(q->filter_list);
        if (!filter)
                return fq_codel_hash(q, skb) + 1;

        *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
        result = tc_classify(skb, filter, &res);

Given that skb->priority can be specified in user-space, doesn't this
mean some application can always override our rules specified by tc
filters? I think we should always respect tc filters over any
application setting.

For discussion, I mean something like below makes more sense for me:

        struct tcf_result res;
        int result;

-       if (TC_H_MAJ(skb->priority) == sch->handle &&
-           TC_H_MIN(skb->priority) > 0 &&
-           TC_H_MIN(skb->priority) <= q->flows_cnt)
-               return TC_H_MIN(skb->priority);
-
        filter = rcu_dereference_bh(q->filter_list);
-       if (!filter)
+       if (!filter) {
+               if (TC_H_MAJ(skb->priority) == sch->handle &&
+                   TC_H_MIN(skb->priority) > 0 &&
+                   TC_H_MIN(skb->priority) <= q->flows_cnt)
+                       return TC_H_MIN(skb->priority);
+
                return fq_codel_hash(q, skb) + 1;
+       }

        *qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
        result = tc_classify(skb, filter, &res);

What do you think?

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 17:34 Why do we prefer skb->priority to tc filters? Cong Wang
@ 2015-03-11 18:08 ` Cong Wang
  2015-03-11 18:25   ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2015-03-11 18:08 UTC (permalink / raw)
  To: netdev; +Cc: Jamal Hadi Salim, David Miller

On Wed, Mar 11, 2015 at 10:34 AM, Cong Wang <cwang@twopensource.com> wrote:
>
> Given that skb->priority can be specified in user-space, doesn't this
> mean some application can always override our rules specified by tc
> filters? I think we should always respect tc filters over any
> application setting.
>

Hmm, on the other hand, skb->priority can be changed in cgroup
too, in this case the current behavior seems correct. :-/ Interesting.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 18:08 ` Cong Wang
@ 2015-03-11 18:25   ` Eric Dumazet
  2015-03-11 19:18     ` Cong Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2015-03-11 18:25 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, David Miller

On Wed, 2015-03-11 at 11:08 -0700, Cong Wang wrote:
> On Wed, Mar 11, 2015 at 10:34 AM, Cong Wang <cwang@twopensource.com> wrote:
> >
> > Given that skb->priority can be specified in user-space, doesn't this
> > mean some application can always override our rules specified by tc
> > filters? I think we should always respect tc filters over any
> > application setting.
> >
> 
> Hmm, on the other hand, skb->priority can be changed in cgroup
> too, in this case the current behavior seems correct. :-/ Interesting.

Note that even HTB has the following code :

static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
                                      int *qerr)
{
        struct htb_sched *q = qdisc_priv(sch);
        struct htb_class *cl;
        struct tcf_result res;
        struct tcf_proto *tcf;
        int result;

        /* allow to select class by setting skb->priority to valid classid;
         * note that nfmark can be used too by attaching filter fw with no
         * rules in it
         */
        if (skb->priority == sch->handle)
                return HTB_DIRECT;      /* X:0 (direct flow) selected */
        cl = htb_find(skb->priority, sch);
        if (cl) {
                if (cl->level == 0)
                        return cl;
                /* Start with inner filter chain if a non-leaf class is selected */
                tcf = rcu_dereference_bh(cl->filter_list);
        } else {
                tcf = rcu_dereference_bh(q->filter_list);
        }


So if skb->priority happens to match sch->handle or any class handle,
we queue packet into a queue, without calling tc_classify()

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 18:25   ` Eric Dumazet
@ 2015-03-11 19:18     ` Cong Wang
  2015-03-11 20:09       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2015-03-11 19:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller

On Wed, Mar 11, 2015 at 11:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-03-11 at 11:08 -0700, Cong Wang wrote:
>> On Wed, Mar 11, 2015 at 10:34 AM, Cong Wang <cwang@twopensource.com> wrote:
>> >
>> > Given that skb->priority can be specified in user-space, doesn't this
>> > mean some application can always override our rules specified by tc
>> > filters? I think we should always respect tc filters over any
>> > application setting.
>> >
>>
>> Hmm, on the other hand, skb->priority can be changed in cgroup
>> too, in this case the current behavior seems correct. :-/ Interesting.
>
> Note that even HTB has the following code :
>

I am not surprised.

[...]

>
> So if skb->priority happens to match sch->handle or any class handle,
> we queue packet into a queue, without calling tc_classify()

This is exactly why I am asking, kernel respects user's choice so much
that it could override even both cgroup prio and tc filters, I don't know
what is the reason behind, but this means our tc filters can not work
with an application which sets skb->priority. For example, our tc filters
classify the packets from this socket into flow 1:1, but application
sets 1:2 by itself...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 19:18     ` Cong Wang
@ 2015-03-11 20:09       ` Eric Dumazet
  2015-03-11 20:46         ` Cong Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2015-03-11 20:09 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, David Miller

On Wed, 2015-03-11 at 12:18 -0700, Cong Wang wrote:
> This is exactly why I am asking, kernel respects user's choice so much
> that it could override even both cgroup prio and tc filters, I don't know
> what is the reason behind, but this means our tc filters can not work
> with an application which sets skb->priority. For example, our tc filters
> classify the packets from this socket into flow 1:1, but application
> sets 1:2 by itself...

Application cannot set whatever skb->priority, unless it really knows
better than kernel.

        case SO_PRIORITY:
                if ((val >= 0 && val <= 6) ||
                    ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
                        sk->sk_priority = val;
                else
                        ret = -EPERM;
                break;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 20:09       ` Eric Dumazet
@ 2015-03-11 20:46         ` Cong Wang
  2015-03-11 21:47           ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2015-03-11 20:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller

On Wed, Mar 11, 2015 at 1:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-03-11 at 12:18 -0700, Cong Wang wrote:
>> This is exactly why I am asking, kernel respects user's choice so much
>> that it could override even both cgroup prio and tc filters, I don't know
>> what is the reason behind, but this means our tc filters can not work
>> with an application which sets skb->priority. For example, our tc filters
>> classify the packets from this socket into flow 1:1, but application
>> sets 1:2 by itself...
>
> Application cannot set whatever skb->priority, unless it really knows
> better than kernel.
>
>         case SO_PRIORITY:
>                 if ((val >= 0 && val <= 6) ||
>                     ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>                         sk->sk_priority = val;
>                 else
>                         ret = -EPERM;
>                 break;
>

That is just a permission check when val > 6, given the fact most
daemons have root permission, I doubt your argument makes a difference
for discussion. At least with userns having root permission is more common.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 20:46         ` Cong Wang
@ 2015-03-11 21:47           ` Eric Dumazet
  2015-03-11 22:12             ` Cong Wang
  2015-03-12  7:53             ` Dmitry Sytchev
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2015-03-11 21:47 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, David Miller

On Wed, 2015-03-11 at 13:46 -0700, Cong Wang wrote:

> That is just a permission check when val > 6, given the fact most
> daemons have root permission, I doubt your argument makes a difference
> for discussion. At least with userns having root permission is more common.

Some setups use ip[6]tables rules to mangle skb->priority to select a
HTB class.

Google definitely uses this model, as netfilter code runs on multiple
cpus, while HTB classifier runs under qdisc spinlock, so far.

If you believe root user should not set skb->priority to arbitrary
values, this is a very different concern.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 21:47           ` Eric Dumazet
@ 2015-03-11 22:12             ` Cong Wang
  2015-03-12  0:00               ` Eric Dumazet
  2015-03-12  7:53             ` Dmitry Sytchev
  1 sibling, 1 reply; 13+ messages in thread
From: Cong Wang @ 2015-03-11 22:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller

On Wed, Mar 11, 2015 at 2:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-03-11 at 13:46 -0700, Cong Wang wrote:
>
>> That is just a permission check when val > 6, given the fact most
>> daemons have root permission, I doubt your argument makes a difference
>> for discussion. At least with userns having root permission is more common.
>
> Some setups use ip[6]tables rules to mangle skb->priority to select a
> HTB class.
>
> Google definitely uses this model, as netfilter code runs on multiple
> cpus, while HTB classifier runs under qdisc spinlock, so far.

I knew we can modify skb->priority in a few ways, for example skbedit.

That is not my concern, all what I am thinking is there is some
way in application layer to bypass our tc filters, which is not expected
to happen for me. Given our specific case, I want to propose to clear
skb->priority after moving out of a netns:

diff --git a/net/core/dev.c b/net/core/dev.c
index 962ee9d..2301f01 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1694,6 +1694,7 @@ int __dev_forward_skb(struct net_device *dev,
struct sk_buff *skb)
        }

        skb_scrub_packet(skb, true);
+       skb->priority = 0;
        skb->protocol = eth_type_trans(skb, dev);
        skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 22:12             ` Cong Wang
@ 2015-03-12  0:00               ` Eric Dumazet
  2015-03-12 16:59                 ` Cong Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2015-03-12  0:00 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Jamal Hadi Salim, David Miller

On Wed, 2015-03-11 at 15:12 -0700, Cong Wang wrote:

> I knew we can modify skb->priority in a few ways, for example skbedit.
> 
> That is not my concern, all what I am thinking is there is some
> way in application layer to bypass our tc filters, which is not expected
> to happen for me. Given our specific case, I want to propose to clear
> skb->priority after moving out of a netns:
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 962ee9d..2301f01 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1694,6 +1694,7 @@ int __dev_forward_skb(struct net_device *dev,
> struct sk_buff *skb)
>         }
> 
>         skb_scrub_packet(skb, true);
> +       skb->priority = 0;
>         skb->protocol = eth_type_trans(skb, dev);
>         skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);

This looks more reasonable than your prior change to fq_codel, sfq, and
others.

If you use mavclan or ipvlan, not all paths will use __dev_forward_skb()

So I am guessing you use veth maybe. Who knows.

Note that you could instead use :

 skb->priority = skb->priority & TC_PRIO_MAX;

This way, TC_PRIO_CONTROL legitimate traffic would not be downgraded to
TC_PRIO_BESTEFFORT.

This all looks like a policy decision, and we probably should not hard
code it in the kernel : Some users have trusted applications running in
containers.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-11 21:47           ` Eric Dumazet
  2015-03-11 22:12             ` Cong Wang
@ 2015-03-12  7:53             ` Dmitry Sytchev
  2015-03-12 17:08               ` Cong Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Dmitry Sytchev @ 2015-03-12  7:53 UTC (permalink / raw)
  To: netdev

Sorry for slight offtopic, but for some reason, recent netfilter SET
changes, which allow setting queue number and priority via ipset
skbinfo extension, don't work with multiple HTB on top of multiq or
mq.
At the same time, tc filters set on outbound iface with skbedit works
fine both for prio and queue number.
How can I find the difference in their behaviour to trace where queue
number set by ipset match gets lost?

> Google definitely uses this model, as netfilter code runs on multiple
> cpus, while HTB classifier runs under qdisc spinlock, so far.
>
> If you believe root user should not set skb->priority to arbitrary
> values, this is a very different concern.



-- 
Best regards,

Dmitry Sytchev,
IT Engineer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-12  0:00               ` Eric Dumazet
@ 2015-03-12 16:59                 ` Cong Wang
  2015-03-12 17:25                   ` Cong Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2015-03-12 16:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller

On Wed, Mar 11, 2015 at 5:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-03-11 at 15:12 -0700, Cong Wang wrote:
>
>> I knew we can modify skb->priority in a few ways, for example skbedit.
>>
>> That is not my concern, all what I am thinking is there is some
>> way in application layer to bypass our tc filters, which is not expected
>> to happen for me. Given our specific case, I want to propose to clear
>> skb->priority after moving out of a netns:
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 962ee9d..2301f01 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1694,6 +1694,7 @@ int __dev_forward_skb(struct net_device *dev,
>> struct sk_buff *skb)
>>         }
>>
>>         skb_scrub_packet(skb, true);
>> +       skb->priority = 0;
>>         skb->protocol = eth_type_trans(skb, dev);
>>         skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
>
> This looks more reasonable than your prior change to fq_codel, sfq, and
> others.


My previous change is only for discussion, never mean to send out.

>
> If you use mavclan or ipvlan, not all paths will use __dev_forward_skb()
>

That is intentional, because I am not sure about stacked device
like macvlan or tunnel, it might make some sense to keep that value.

> So I am guessing you use veth maybe. Who knows.
>


Exactly, otherwise I would probably just change skb_scrub_packet(). ;)


> Note that you could instead use :
>
>  skb->priority = skb->priority & TC_PRIO_MAX;
>
> This way, TC_PRIO_CONTROL legitimate traffic would not be downgraded to
> TC_PRIO_BESTEFFORT.
>
> This all looks like a policy decision, and we probably should not hard
> code it in the kernel : Some users have trusted applications running in
> containers.
>

For veth case, I think we don't need to keep TC_PRIO_* either, because
the packet will normally go to Rx path after that, where these values make
no sense.

(For my own case, we redirect it to eth0, but still it goes out of its
original netns).

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-12  7:53             ` Dmitry Sytchev
@ 2015-03-12 17:08               ` Cong Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Cong Wang @ 2015-03-12 17:08 UTC (permalink / raw)
  To: Dmitry Sytchev; +Cc: netdev

On Thu, Mar 12, 2015 at 12:53 AM, Dmitry Sytchev <kbdfck@gmail.com> wrote:
> Sorry for slight offtopic, but for some reason, recent netfilter SET
> changes, which allow setting queue number and priority via ipset
> skbinfo extension, don't work with multiple HTB on top of multiq or
> mq.

Can you show us your setup? `iptables -L` and `tc qd show dev ...`,
`tc class show dev ...` etc.


> At the same time, tc filters set on outbound iface with skbedit works
> fine both for prio and queue number.
> How can I find the difference in their behaviour to trace where queue
> number set by ipset match gets lost?

There are fewer places to set skb->queue_mapping than skb->priority.
Since you are using mq, you can watch the `tc -s -d class show dev ...`
output to see if your packets go to the desired queue.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why do we prefer skb->priority to tc filters?
  2015-03-12 16:59                 ` Cong Wang
@ 2015-03-12 17:25                   ` Cong Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Cong Wang @ 2015-03-12 17:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Jamal Hadi Salim, David Miller

On Thu, Mar 12, 2015 at 9:59 AM, Cong Wang <cwang@twopensource.com> wrote:
> On Wed, Mar 11, 2015 at 5:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> If you use mavclan or ipvlan, not all paths will use __dev_forward_skb()
>>
>
> That is intentional, because I am not sure about stacked device
> like macvlan or tunnel, it might make some sense to keep that value.
>
>> So I am guessing you use veth maybe. Who knows.
>>
>
>
> Exactly, otherwise I would probably just change skb_scrub_packet(). ;)

Ok, in some mode macvlan forwards the packets to Rx path too, interesting...

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-03-12 17:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11 17:34 Why do we prefer skb->priority to tc filters? Cong Wang
2015-03-11 18:08 ` Cong Wang
2015-03-11 18:25   ` Eric Dumazet
2015-03-11 19:18     ` Cong Wang
2015-03-11 20:09       ` Eric Dumazet
2015-03-11 20:46         ` Cong Wang
2015-03-11 21:47           ` Eric Dumazet
2015-03-11 22:12             ` Cong Wang
2015-03-12  0:00               ` Eric Dumazet
2015-03-12 16:59                 ` Cong Wang
2015-03-12 17:25                   ` Cong Wang
2015-03-12  7:53             ` Dmitry Sytchev
2015-03-12 17:08               ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.