netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
       [not found] <4538d7d2-0d43-16b7-9f80-77355f08cc61@huawei.com>
@ 2023-01-17  1:03 ` shaozhengchao
       [not found] ` <CAM0EoM=rqF8K997AmC0VDncJ9LeA0PJku2BL96iiatAOiv1-vw@mail.gmail.com>
  1 sibling, 0 replies; 12+ messages in thread
From: shaozhengchao @ 2023-01-17  1:03 UTC (permalink / raw)
  To: Jamal Hadi Salim, Greg Kroah-Hartman
  Cc: zengyhkyle, David Miller, Sasha Levin, Cong Wang, Jiri Pirko,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, netdev, YueHaibing

+cc netdev@vger.kernel.org yuehaibing

On 2023/1/16 16:27, shaozhengchao wrote:
> When I analyzed the following LTS 5.10 patch, I had a small question:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=b2c917e510e5ddbc7896329c87d20036c8b82952
> 
> As described in this patch, res is obtained through the tcf_classify()
> interface. If result is TC_ACT_SHOT, res may be an abnormal value.
> Accessing class in res will cause abnormal access.
> 
> For LTS version 5.10, if tcf_classify() is to return a positive value,
> the classify hook function to the filter must be called, and the hook 
> function returns a positive number. Observe the classify function of 
> each filter. Generally, res is initialized in four scenarios.
> 1. res is assigned a value by res in the private member of each filter.
> Generally, kzalloc is used to assign initial values to res of various
> filters. Therefore, class in res is initialized to 0. Then use the
> tcf_bind_filter() interface to assign values to members in res.
> Therefore, value of class is assigned. For example, cls_basic.
> 2. The classify function of the filter directly assigns a value to the
> class of res, for example, cls_cgroup.
> 3. The filter classify function references tp and assigns a value to
> res, for example, cls_u32.
> 4. The change function of the filter references fh and assigns a value
> to class in res, for example, cls_rsvp.
> 
> This Mainline problem is caused by commit:3aa260559455 (" net/sched:
> store the last executed chain also for clsact egress") and
> commit:9410c9409d3e ("net: sched: Introduce ingress classification
> function"). I don't know if my analysis is correct, please help correct, 
> thank you very much.
> 
> Zhengchao Shao

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
       [not found] ` <CAM0EoM=rqF8K997AmC0VDncJ9LeA0PJku2BL96iiatAOiv1-vw@mail.gmail.com>
@ 2023-01-17 19:07   ` Jamal Hadi Salim
  2023-01-18  0:10     ` Kyle Zeng
  0 siblings, 1 reply; 12+ messages in thread
From: Jamal Hadi Salim @ 2023-01-17 19:07 UTC (permalink / raw)
  To: shaozhengchao
  Cc: zengyhkyle, David Miller, Sasha Levin, Cong Wang, Jiri Pirko,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, Davide Caratti,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

+Cc netdev

On Tue, Jan 17, 2023 at 10:06 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> Trimmed Cc (+Davide).
>
> I am not sure i followed what you are saying because i dont see the
> relationship between the
> two commits. Did that patch(9410c9409d3e) cause a problem?
> How do you reproduce the issue that is caused by this patch that you are seeing?
>
> One of the challenges we have built over time is consumers of classification and
> action execution may not be fully conserving the semantics of the return code.
> The return code is a "verdict" on what happened; a safer approach is to get the
> return code to be either an error/success code. But that seems to be a
> separate issue.
>
> cheers,
> jamal
>
> On Mon, Jan 16, 2023 at 3:28 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
> >
> > When I analyzed the following LTS 5.10 patch, I had a small question:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=b2c917e510e5ddbc7896329c87d20036c8b82952
> >
> > As described in this patch, res is obtained through the tcf_classify()
> > interface. If result is TC_ACT_SHOT, res may be an abnormal value.
> > Accessing class in res will cause abnormal access.
> >
> > For LTS version 5.10, if tcf_classify() is to return a positive value,
> > the classify hook function to the filter must be called, and the hook
> > function returns a positive number. Observe the classify function of
> > each filter. Generally, res is initialized in four scenarios.
> > 1. res is assigned a value by res in the private member of each filter.
> > Generally, kzalloc is used to assign initial values to res of various
> > filters. Therefore, class in res is initialized to 0. Then use the
> > tcf_bind_filter() interface to assign values to members in res.
> > Therefore, value of class is assigned. For example, cls_basic.
> > 2. The classify function of the filter directly assigns a value to the
> > class of res, for example, cls_cgroup.
> > 3. The filter classify function references tp and assigns a value to
> > res, for example, cls_u32.
> > 4. The change function of the filter references fh and assigns a value
> > to class in res, for example, cls_rsvp.
> >
> > This Mainline problem is caused by commit:3aa260559455 (" net/sched:
> > store the last executed chain also for clsact egress") and
> > commit:9410c9409d3e ("net: sched: Introduce ingress classification
> > function"). I don't know if my analysis is correct, please help correct,
> > thank you very much.
> >
> > Zhengchao Shao

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-17 19:07   ` Jamal Hadi Salim
@ 2023-01-18  0:10     ` Kyle Zeng
  2023-01-18 11:06       ` Davide Caratti
  2023-01-19  3:41       ` shaozhengchao
  0 siblings, 2 replies; 12+ messages in thread
From: Kyle Zeng @ 2023-01-18  0:10 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: shaozhengchao, David Miller, Sasha Levin, Cong Wang, Jiri Pirko,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, Davide Caratti,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

Hi Zhengchao,

I'm the finder of the vulnerability. In my initial report, there was a
more detailed explanation of why this bug happened. But it got left
out in the commit message.
So, I'll explain it here and see whether people want to patch the
actual root cause of the crash.

The underlying bug that this patch was trying to address is actually
in `__tcf_classify`. Notice that `struct tcf_result` is actually a
union type, so whenever the kernel sets res.goto_tp, it also sets
res.class. And this can happen inside `tcf_action_goto_chain_exec`. In
other words, `tcf_action_goto_chain_exec` will set res.class. Notice
that goto_chain can point back to itself, which causes an infinite
loop. To avoid the infinite loop situation, `__tcf_classify` checks
how many times the loop has been executed
(https://elixir.bootlin.com/linux/v6.1/source/net/sched/cls_api.c#L1586),
if it is more than a specific number, it will mark the result as
TC_ACT_SHOT and then return:

if (unlikely(limit++ >= max_reclassify_loop)) {
    ...
    return TC_ACT_SHOT;
}

However, when it returns in the infinite loop handler, it forgets to
clear whatever is in the `res` variable, which still holds pointers in
`goto_tp`. As a result, cbq_classify will think it is a valid
`res.class` and causes type confusion.

My initial proposed patch was to memset `res` before `return
TC_ACT_SHOT` in `__tcf_classify`, but it didn't get merged. But I
guess the merged patch is more generic.

BTW, I'm not sure whether it is a bug or it is intended in the merged
patch for this bug: moving `return NULL` statement in `cbq_classify`
results in a behavior change that is not documented anywhere:
previously, packets that return TC_ACT_QUEUED, TC_ACT_STOLEN, and
TC_ACT_TRAP will eventually return NULL, but now they will be passed
into `cbq_reclassify`. Is this expected?

Best,
Kyle Zeng

On Tue, Jan 17, 2023 at 12:07 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> +Cc netdev
>
> On Tue, Jan 17, 2023 at 10:06 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > Trimmed Cc (+Davide).
> >
> > I am not sure i followed what you are saying because i dont see the
> > relationship between the
> > two commits. Did that patch(9410c9409d3e) cause a problem?
> > How do you reproduce the issue that is caused by this patch that you are seeing?
> >
> > One of the challenges we have built over time is consumers of classification and
> > action execution may not be fully conserving the semantics of the return code.
> > The return code is a "verdict" on what happened; a safer approach is to get the
> > return code to be either an error/success code. But that seems to be a
> > separate issue.
> >
> > cheers,
> > jamal
> >
> > On Mon, Jan 16, 2023 at 3:28 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
> > >
> > > When I analyzed the following LTS 5.10 patch, I had a small question:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=b2c917e510e5ddbc7896329c87d20036c8b82952
> > >
> > > As described in this patch, res is obtained through the tcf_classify()
> > > interface. If result is TC_ACT_SHOT, res may be an abnormal value.
> > > Accessing class in res will cause abnormal access.
> > >
> > > For LTS version 5.10, if tcf_classify() is to return a positive value,
> > > the classify hook function to the filter must be called, and the hook
> > > function returns a positive number. Observe the classify function of
> > > each filter. Generally, res is initialized in four scenarios.
> > > 1. res is assigned a value by res in the private member of each filter.
> > > Generally, kzalloc is used to assign initial values to res of various
> > > filters. Therefore, class in res is initialized to 0. Then use the
> > > tcf_bind_filter() interface to assign values to members in res.
> > > Therefore, value of class is assigned. For example, cls_basic.
> > > 2. The classify function of the filter directly assigns a value to the
> > > class of res, for example, cls_cgroup.
> > > 3. The filter classify function references tp and assigns a value to
> > > res, for example, cls_u32.
> > > 4. The change function of the filter references fh and assigns a value
> > > to class in res, for example, cls_rsvp.
> > >
> > > This Mainline problem is caused by commit:3aa260559455 (" net/sched:
> > > store the last executed chain also for clsact egress") and
> > > commit:9410c9409d3e ("net: sched: Introduce ingress classification
> > > function"). I don't know if my analysis is correct, please help correct,
> > > thank you very much.
> > >
> > > Zhengchao Shao

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-18  0:10     ` Kyle Zeng
@ 2023-01-18 11:06       ` Davide Caratti
  2023-01-18 13:06         ` Jamal Hadi Salim
  2023-01-19  3:41       ` shaozhengchao
  1 sibling, 1 reply; 12+ messages in thread
From: Davide Caratti @ 2023-01-18 11:06 UTC (permalink / raw)
  To: Kyle Zeng
  Cc: Jamal Hadi Salim, shaozhengchao, David Miller, Sasha Levin,
	Cong Wang, Jiri Pirko, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

hello,

On Tue, Jan 17, 2023 at 05:10:58PM -0700, Kyle Zeng wrote:
> Hi Zhengchao,
> 
> I'm the finder of the vulnerability. In my initial report, there was a
> more detailed explanation of why this bug happened. But it got left
> out in the commit message.
> So, I'll explain it here and see whether people want to patch the
> actual root cause of the crash.
> 
> The underlying bug that this patch was trying to address is actually
> in `__tcf_classify`. Notice that `struct tcf_result` is actually a
> union type, so whenever the kernel sets res.goto_tp, it also sets
> res.class.

From what I see/remember, 'res' (struct tcf_result) is unassigned
unless the packet is matched by a classifier (i.e. it does not return
TC_ACT_UNSPEC).

When this match happens (__tcf_classify returns non-negative) and the
control action says TC_ACT_GOTO_CHAIN, res->goto_tp is written.
Like you say, 'res.class' is written as well because it's a union.

> And this can happen inside `tcf_action_goto_chain_exec`. In
> other words, `tcf_action_goto_chain_exec` will set res.class. Notice
> that goto_chain can point back to itself, which causes an infinite
> loop. To avoid the infinite loop situation, `__tcf_classify` checks
> how many times the loop has been executed
> (https://elixir.bootlin.com/linux/v6.1/source/net/sched/cls_api.c#L1586),
> if it is more than a specific number, it will mark the result as
> TC_ACT_SHOT and then return:
> 
> if (unlikely(limit++ >= max_reclassify_loop)) {
>     ...
>     return TC_ACT_SHOT;
> }

maybe there is an easier reproducer, something made of 2 TC actions.
The first one goes to a valid chain, and then the second one (executed from
within the chain) drops the packet. I think that unpatched CBQ scheduler 
will find 'res.class' with a value also there.
 
> However, when it returns in the infinite loop handler, it forgets to
> clear whatever is in the `res` variable, which still holds pointers in
> `goto_tp`. As a result, cbq_classify will think it is a valid
> `res.class` and causes type confusion.
> 
> My initial proposed patch was to memset `res` before `return
> TC_ACT_SHOT` in `__tcf_classify`, but it didn't get merged. But I
> guess the merged patch is more generic.

The merged patch looks good to me; however, I wonder if it's sufficient.
If I well read the code, there is still the possibility of hitting the
same problem on a patched kernel when TC_ACT_TRAP / TC_ACT_STOLEN is
returned after a 'goto chain' when the qdisc is CBQ.

I like Jamal's idea of sharing the reproducer :)

thanks!
-- 
davide

 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-18 11:06       ` Davide Caratti
@ 2023-01-18 13:06         ` Jamal Hadi Salim
  2023-01-18 16:00           ` Davide Caratti
  2023-01-19  6:33           ` shaozhengchao
  0 siblings, 2 replies; 12+ messages in thread
From: Jamal Hadi Salim @ 2023-01-18 13:06 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Kyle Zeng, shaozhengchao, David Miller, Sasha Levin, Cong Wang,
	Jiri Pirko, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

The reproducer the Kyle included back then was not useful - it seemed
like a cutnpaste
from some syzkaller dashboard (and didnt compile); however, for this one issue,
you can reproduce the problem by creating the infinite loop setup that
Davide describes.

The main issue is bigger than tcf_classify: It has to do with
interpretation of tcf_result
and the return codes.
I reviewed all consumers of tcf_results and only 3 (all happened to be qdiscs)
were fixed in that patch set. Note consumers include all objects in
the hierarchy
including classifiers and action.

Typically, the LinuxWay(tm) of cutting and pasting what other people before you
did works - but sometimes people forget environmental rules even when they are
documented. The main environmental rule that was at stake here is the return
(verdict) code said to drop the packet. The validity of tcf_result in
such a case is
questionable and setting it to 0 was irrelevant. So that is all the
fix had to do for -net.

The current return code is a "verdict" on what happened. Given that
there is potential
to misinterpret - as was seen here - a safer approach is to get the
return code to be either
an error/success code(eg ELOOP for the example being quoted) since
that is a more
common pattern and we store the "verdict code" in tcf_result (TC_ACT_SHOT).
I was planning to send an RFC patch for that.

I am still not clear on the correlation that Zhengchao Shao was making between
Davide's patch and this issue...

cheers,
jamal


On Wed, Jan 18, 2023 at 6:06 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> hello,
>
> On Tue, Jan 17, 2023 at 05:10:58PM -0700, Kyle Zeng wrote:
> > Hi Zhengchao,
> >
> > I'm the finder of the vulnerability. In my initial report, there was a
> > more detailed explanation of why this bug happened. But it got left
> > out in the commit message.
> > So, I'll explain it here and see whether people want to patch the
> > actual root cause of the crash.
> >
> > The underlying bug that this patch was trying to address is actually
> > in `__tcf_classify`. Notice that `struct tcf_result` is actually a
> > union type, so whenever the kernel sets res.goto_tp, it also sets
> > res.class.
>
> From what I see/remember, 'res' (struct tcf_result) is unassigned
> unless the packet is matched by a classifier (i.e. it does not return
> TC_ACT_UNSPEC).
>
> When this match happens (__tcf_classify returns non-negative) and the
> control action says TC_ACT_GOTO_CHAIN, res->goto_tp is written.
> Like you say, 'res.class' is written as well because it's a union.
>
> > And this can happen inside `tcf_action_goto_chain_exec`. In
> > other words, `tcf_action_goto_chain_exec` will set res.class. Notice
> > that goto_chain can point back to itself, which causes an infinite
> > loop. To avoid the infinite loop situation, `__tcf_classify` checks
> > how many times the loop has been executed
> > (https://elixir.bootlin.com/linux/v6.1/source/net/sched/cls_api.c#L1586),
> > if it is more than a specific number, it will mark the result as
> > TC_ACT_SHOT and then return:
> >
> > if (unlikely(limit++ >= max_reclassify_loop)) {
> >     ...
> >     return TC_ACT_SHOT;
> > }
>
> maybe there is an easier reproducer, something made of 2 TC actions.
> The first one goes to a valid chain, and then the second one (executed from
> within the chain) drops the packet. I think that unpatched CBQ scheduler
> will find 'res.class' with a value also there.
>
> > However, when it returns in the infinite loop handler, it forgets to
> > clear whatever is in the `res` variable, which still holds pointers in
> > `goto_tp`. As a result, cbq_classify will think it is a valid
> > `res.class` and causes type confusion.
> >
> > My initial proposed patch was to memset `res` before `return
> > TC_ACT_SHOT` in `__tcf_classify`, but it didn't get merged. But I
> > guess the merged patch is more generic.
>
> The merged patch looks good to me; however, I wonder if it's sufficient.
> If I well read the code, there is still the possibility of hitting the
> same problem on a patched kernel when TC_ACT_TRAP / TC_ACT_STOLEN is
> returned after a 'goto chain' when the qdisc is CBQ.
>
> I like Jamal's idea of sharing the reproducer :)


> thanks!
> --
> davide
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-18 13:06         ` Jamal Hadi Salim
@ 2023-01-18 16:00           ` Davide Caratti
  2023-01-18 17:00             ` Jamal Hadi Salim
  2023-01-19  6:33           ` shaozhengchao
  1 sibling, 1 reply; 12+ messages in thread
From: Davide Caratti @ 2023-01-18 16:00 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Kyle Zeng, shaozhengchao, David Miller, Sasha Levin, Cong Wang,
	Jiri Pirko, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

all good points!

On Wed, Jan 18, 2023 at 08:06:12AM -0500, Jamal Hadi Salim wrote:

> The main issue is bigger than tcf_classify: It has to do with
> interpretation of tcf_result and the return codes.

Some classful qdiscs ignore the value of res.class and just lookup the
configured classes based on the value of res.classid. In this case, the qdisc
code validates tcf_results, and the resulting class is always a valid pointer
with correct layout and size.

With HTB, CBQ, DRR, HFSC, QFQ and ATM it's possible that the TC classifier
selects a traffic class for a given packet and writes that pointer in 'res.class'.
When/if this happens, the qdisc doesn't need to use res.classid and then lookup for
the traffic class: it assumes that res.class is already a good pointer to a
struct of the correct type (struct hfsc_class for HFSC, for instance).

> The main environmental rule that was at stake here is the return
> (verdict) code said to drop the packet. The validity of tcf_result in
> such a case is questionable and setting it to 0 was irrelevant.

I remember that the first matchall implementation forgot the assignment
of 'res' and this caused a problem similar to the reported one [1]. In my opinion
we should consider to eliminate 'res.class' and its usage in the above qdiscs,
if we find out that current TC filters just write res.classid (not sure of what
cls_bpf does, though) and constantly write 0 to res.class.

[1] https://lore.kernel.org/netdev/b930159de5531a4d216a1cd2c2ef03aa41f421f9.1505562794.git.dcaratti@redhat.com/

> The current return code is a "verdict" on what happened. Given that
> there is potential to misinterpret - as was seen here - a safer approach is to get the
> return code to be either an error/success code(eg ELOOP for the example being quoted) since
> that is a more common pattern and we store the "verdict code" in tcf_result (TC_ACT_SHOT).
> I was planning to send an RFC patch for that.

well, the implementation of "goto_chain" actually abuses tcf_result:
since it's going to pass the packet to another classifier, it
temporarily stores a handle to the next filter in the tcf_result -   
instead of passing it through the stack. That is not a problem, unless
a packet hits the max_reclassify_loop and a CBQ qdisc that dereferences
'res.class' even with a packet drop :)

> I am still not clear on the correlation that Zhengchao Shao was making between
> Davide's patch and this issue...

In my understanding there is no correlation. Oh, last small note:

> On Wed, Jan 18, 2023 at 6:06 AM Davide Caratti <dcaratti@redhat.com> wrote:
> >
> >
> > The merged patch looks good to me; however, I wonder if it's sufficient.
> > If I well read the code, there is still the possibility of hitting the
> > same problem on a patched kernel when TC_ACT_TRAP / TC_ACT_STOLEN is
> > returned after a 'goto chain' when the qdisc is CBQ.

I think it is sufficient to avoid the crash, since the value of 'res.class'
should be a valid pointer (NULL most of the times?) after each call to 

tp->classify(skb, tp, res);

if the return value is different than TC_ACT_UNSPEC and TC_ACT_SHOT.
However, I still don't understand why cbq should charge classes when the
packet has been 'stolen' or 'trapped'...

-- 
davide


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-18 16:00           ` Davide Caratti
@ 2023-01-18 17:00             ` Jamal Hadi Salim
  2023-01-19 16:34               ` Davide Caratti
  0 siblings, 1 reply; 12+ messages in thread
From: Jamal Hadi Salim @ 2023-01-18 17:00 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Kyle Zeng, shaozhengchao, David Miller, Sasha Levin, Cong Wang,
	Jiri Pirko, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

[-- Attachment #1: Type: text/plain, Size: 4526 bytes --]

On Wed, Jan 18, 2023 at 11:00 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> all good points!
>
> On Wed, Jan 18, 2023 at 08:06:12AM -0500, Jamal Hadi Salim wrote:
>
> > The main issue is bigger than tcf_classify: It has to do with
> > interpretation of tcf_result and the return codes.
>
> Some classful qdiscs ignore the value of res.class and just lookup the
> configured classes based on the value of res.classid. In this case, the qdisc
> code validates tcf_results, and the resulting class is always a valid pointer
> with correct layout and size.

It is upto the qdisc implementation to decide how to operate. The
danger is always
in a hierarchy of calls any of which may go badly to assume that the
pointer is valid.
If we switch to returning proper error codes i think it will be easier
to relay such
information to the consumer of tcf_result.

> With HTB, CBQ, DRR, HFSC, QFQ and ATM it's possible that the TC classifier
> selects a traffic class for a given packet and writes that pointer in 'res.class'.

It's their choice of how to implement. I think actually cbq (which is
quiet a complex
scheduling algorithm) started this and the rest cutnpasted from it.

> When/if this happens, the qdisc doesn't need to use res.classid and then lookup for
> the traffic class: it assumes that res.class is already a good pointer to a
> struct of the correct type (struct hfsc_class for HFSC, for instance).
>

If all goes well, it is a good pointer.

> > The main environmental rule that was at stake here is the return
> > (verdict) code said to drop the packet. The validity of tcf_result in
> > such a case is questionable and setting it to 0 was irrelevant.
>
> I remember that the first matchall implementation forgot the assignment
> of 'res' and this caused a problem similar to the reported one [1]. In my opinion
> we should consider to eliminate 'res.class' and its usage in the above qdiscs,
> if we find out that current TC filters just write res.classid (not sure of what
> cls_bpf does, though) and constantly write 0 to res.class.
>
> [1] https://lore.kernel.org/netdev/b930159de5531a4d216a1cd2c2ef03aa41f421f9.1505562794.git.dcaratti@redhat.com/

Yikes. That should have been caught during the review.

> > The current return code is a "verdict" on what happened. Given that
> > there is potential to misinterpret - as was seen here - a safer approach is to get the
> > return code to be either an error/success code(eg ELOOP for the example being quoted) since
> > that is a more common pattern and we store the "verdict code" in tcf_result (TC_ACT_SHOT).
> > I was planning to send an RFC patch for that.
>
> well, the implementation of "goto_chain" actually abuses tcf_result:
> since it's going to pass the packet to another classifier, it
> temporarily stores a handle to the next filter in the tcf_result -
> instead of passing it through the stack. That is not a problem, unless
> a packet hits the max_reclassify_loop and a CBQ qdisc that dereferences
> 'res.class' even with a packet drop :)
>

The rule is tcf_results should be returning the results and execution
state to the
caller. The goto_chain maybe ok in this case. Also the mirred state
that i cant remember
who added was also reasonable if you wanted to defer redirect to the
caller instead
of returning ACT_STOLEN and friends.

[..]
> > On Wed, Jan 18, 2023 at 6:06 AM Davide Caratti <dcaratti@redhat.com> wrote:
> > >
> > >
> > > The merged patch looks good to me; however, I wonder if it's sufficient.
> > > If I well read the code, there is still the possibility of hitting the
> > > same problem on a patched kernel when TC_ACT_TRAP / TC_ACT_STOLEN is
> > > returned after a 'goto chain' when the qdisc is CBQ.
>
> I think it is sufficient to avoid the crash, since the value of 'res.class'
> should be a valid pointer (NULL most of the times?) after each call to
>
> tp->classify(skb, tp, res);
>
> if the return value is different than TC_ACT_UNSPEC and TC_ACT_SHOT.
> However, I still don't understand why cbq should charge classes when the
> packet has been 'stolen' or 'trapped'...

ACT_SHOT means (unfortunately ambiguous) "this is bad" - It means you
really cannot interpret the results.
That is not the case for the others.

BTW, I did create a patch initially when this issue surfaced but we
needed to get something
to net first. See attached. The proper way to do this is to have the
small surgery that
returns the errcode instead of verdict code and store TC_ACT_XXX in tcf_result
(in place of errcode).

cheers,
jamal

[-- Attachment #2: patchlet-fix-errcode-result --]
[-- Type: application/octet-stream, Size: 19147 bytes --]

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d5517719af4e..6b7c1e452703 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -319,13 +319,13 @@ struct Qdisc_ops {
 
 
 struct tcf_result {
+	u32 errcode;
 	union {
 		struct {
 			unsigned long	class;
 			u32		classid;
 		};
 		const struct tcf_proto *goto_tp;
-
 	};
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index b76fb37b381e..3a9d3d187aca 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3933,7 +3933,8 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
-	struct tcf_result cl_res;
+	struct tcf_result cl_res = { };
+	int res;
 
 	if (!miniq)
 		return skb;
@@ -3943,16 +3944,19 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	tc_skb_cb(skb)->post_ct = false;
 	mini_qdisc_bstats_cpu_update(miniq, skb);
 
-	switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
-	case TC_ACT_OK:
-	case TC_ACT_RECLASSIFY:
-		skb->tc_index = TC_H_MIN(cl_res.classid);
-		break;
-	case TC_ACT_SHOT:
+	res = tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false);
+	if (res == TC_ACT_SHOT || cl_res.errcode) {
 		mini_qdisc_qstats_cpu_drop(miniq);
 		*ret = NET_XMIT_DROP;
 		kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS);
 		return NULL;
+	}
+
+	switch (res) {
+	case TC_ACT_OK:
+	case TC_ACT_RECLASSIFY:
+		skb->tc_index = TC_H_MIN(cl_res.classid);
+		break;
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
 	case TC_ACT_TRAP:
@@ -5100,7 +5104,8 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 {
 #ifdef CONFIG_NET_CLS_ACT
 	struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress);
-	struct tcf_result cl_res;
+	struct tcf_result cl_res = { };
+	int res;
 
 	/* If there's at least one ingress present somewhere (so
 	 * we get here via enabled static key), remaining devices
@@ -5121,16 +5126,20 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 	skb->tc_at_ingress = 1;
 	mini_qdisc_bstats_cpu_update(miniq, skb);
 
-	switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, false)) {
-	case TC_ACT_OK:
-	case TC_ACT_RECLASSIFY:
-		skb->tc_index = TC_H_MIN(cl_res.classid);
-		break;
-	case TC_ACT_SHOT:
+	res = tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res,
+			   false);
+	if (res == TC_ACT_SHOT || cl_res.errcode) {
 		mini_qdisc_qstats_cpu_drop(miniq);
 		kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
 		*ret = NET_RX_DROP;
 		return NULL;
+	}
+
+	switch (res) {
+	case TC_ACT_OK:
+	case TC_ACT_RECLASSIFY:
+		skb->tc_index = TC_H_MIN(cl_res.classid);
+		break;
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
 	case TC_ACT_TRAP:
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 23d1cfa4f58c..db54e9fc393e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1588,6 +1588,7 @@ static inline int __tcf_classify(struct sk_buff *skb,
 				       tp->chain->block->index,
 				       tp->prio & 0xffff,
 				       ntohs(tp->protocol));
+		res->errcode = ELOOP;
 		return TC_ACT_SHOT;
 	}
 
@@ -1619,8 +1620,10 @@ int tcf_classify(struct sk_buff *skb,
 			struct tcf_chain *fchain;
 
 			fchain = tcf_chain_lookup_rcu(block, ext->chain);
-			if (!fchain)
+			if (!fchain) {
+				res->errcode = ENOENT;
 				return TC_ACT_SHOT;
+			}
 
 			/* Consume, so cloned/redirect skbs won't inherit ext */
 			skb_ext_del(skb, TC_SKB_EXT);
@@ -1632,6 +1635,8 @@ int tcf_classify(struct sk_buff *skb,
 
 	ret = __tcf_classify(skb, tp, orig_tp, res, compat_mode,
 			     &last_executed_chain);
+	if (ret == TC_ACT_SHOT || res->errcode)
+		return TC_ACT_SHOT;
 
 	if (tc_skb_ext_tc_enabled()) {
 		/* If we missed on some chain */
@@ -1639,8 +1644,10 @@ int tcf_classify(struct sk_buff *skb,
 			struct tc_skb_cb *cb = tc_skb_cb(skb);
 
 			ext = tc_skb_ext_alloc(skb);
-			if (WARN_ON_ONCE(!ext))
+			if (WARN_ON_ONCE(!ext)) {
+				res->errcode = ENOMEM;
 				return TC_ACT_SHOT;
+			}
 			ext->chain = last_executed_chain;
 			ext->mru = cb->mru;
 			ext->post_ct = cb->post_ct;
@@ -3679,20 +3686,24 @@ EXPORT_SYMBOL(tcf_qevent_validate_change);
 struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, struct sk_buff *skb,
 				  struct sk_buff **to_free, int *ret)
 {
-	struct tcf_result cl_res;
+	struct tcf_result cl_res = { };
 	struct tcf_proto *fl;
+	int res;
 
 	if (!qe->info.block_index)
 		return skb;
 
+
 	fl = rcu_dereference_bh(qe->filter_chain);
 
-	switch (tcf_classify(skb, NULL, fl, &cl_res, false)) {
-	case TC_ACT_SHOT:
+	res = tcf_classify(skb, NULL, fl, &cl_res, false);
+	if (cl_res.errcode || res == TC_ACT_SHOT) {
 		qdisc_qstats_drop(sch);
 		__qdisc_drop(skb, to_free);
 		*ret = __NET_XMIT_BYPASS;
 		return NULL;
+	}
+	switch (res) {
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
 	case TC_ACT_TRAP:
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index f52255fea652..1cbe833eb7a2 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -376,7 +376,7 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 {
 	struct atm_qdisc_data *p = qdisc_priv(sch);
 	struct atm_flow_data *flow;
-	struct tcf_result res;
+	struct tcf_result res = { };
 	int result;
 	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 
@@ -391,8 +391,11 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			fl = rcu_dereference_bh(flow->filter_list);
 			if (fl) {
 				result = tcf_classify(skb, NULL, fl, &res, true);
+				if (result == TC_ACT_SHOT || res.errcode)
+					return NULL;
 				if (result < 0)
 					continue;
+
 				flow = (struct atm_flow_data *)res.class;
 				if (!flow)
 					flow = lookup_flow(sch, res.classid);
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 3ed0c3342189..1922c3569097 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1656,7 +1656,7 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t,
 {
 	struct cake_sched_data *q = qdisc_priv(sch);
 	struct tcf_proto *filter;
-	struct tcf_result res;
+	struct tcf_result res = { };
 	u16 flow = 0, host = 0;
 	int result;
 
@@ -1666,6 +1666,8 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t,
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	result = tcf_classify(skb, NULL, filter, &res, false);
+	if (result == TC_ACT_SHOT || res.errcode)
+		return 0;
 
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
@@ -1674,8 +1676,6 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t,
 		case TC_ACT_QUEUED:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return 0;
 		}
 #endif
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 6568e17c4c63..e0e3bd7bf676 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -209,7 +209,7 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	struct cbq_class *cl = NULL;
 	u32 prio = skb->priority;
 	struct tcf_proto *fl;
-	struct tcf_result res;
+	struct tcf_result res = { };
 
 	/*
 	 *  Step 1. If skb->priority points to one of our classes, use it.
@@ -228,6 +228,8 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 		 * Step 2+n. Apply classifier.
 		 */
 		result = tcf_classify(skb, NULL, fl, &res, true);
+		if (result == TC_ACT_SHOT || res.errcode)
+			return NULL;
 		if (!fl || result < 0)
 			goto fallback;
 
@@ -250,8 +252,6 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 			fallthrough;
-		case TC_ACT_SHOT:
-			return NULL;
 		case TC_ACT_RECLASSIFY:
 			return cbq_reclassify(skb, cl);
 		}
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index e35a4e90f4e6..dc77fd501c3f 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -294,8 +294,8 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 				      int *qerr)
 {
 	struct drr_sched *q = qdisc_priv(sch);
+	struct tcf_result res = { };
 	struct drr_class *cl;
-	struct tcf_result res;
 	struct tcf_proto *fl;
 	int result;
 
@@ -308,6 +308,8 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	fl = rcu_dereference_bh(q->filter_list);
 	result = tcf_classify(skb, NULL, fl, &res, false);
+	if (result == TC_ACT_SHOT || res.errcode)
+		return NULL;
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -315,8 +317,6 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 		case TC_ACT_STOLEN:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return NULL;
 		}
 #endif
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 401ffaf87d62..f392f60811c1 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -236,10 +236,13 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (TC_H_MAJ(skb->priority) == sch->handle)
 		skb->tc_index = TC_H_MIN(skb->priority);
 	else {
-		struct tcf_result res;
+		struct tcf_result res = { };
 		struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
 		int result = tcf_classify(skb, NULL, fl, &res, false);
 
+		if (result == TC_ACT_SHOT || res.errcode)
+			goto drop;
+
 		pr_debug("result %d class 0x%04x\n", result, res.classid);
 
 		switch (result) {
@@ -249,9 +252,6 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		case TC_ACT_TRAP:
 			__qdisc_drop(skb, to_free);
 			return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-
-		case TC_ACT_SHOT:
-			goto drop;
 #endif
 		case TC_ACT_OK:
 			skb->tc_index = TC_H_MIN(res.classid);
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index b10efeaf0629..e36b4c318d15 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -374,8 +374,8 @@ static struct ets_class *ets_classify(struct sk_buff *skb, struct Qdisc *sch,
 				      int *qerr)
 {
 	struct ets_sched *q = qdisc_priv(sch);
+	struct tcf_result res = { };
 	u32 band = skb->priority;
-	struct tcf_result res;
 	struct tcf_proto *fl;
 	int err;
 
@@ -383,14 +383,14 @@ static struct ets_class *ets_classify(struct sk_buff *skb, struct Qdisc *sch,
 	if (TC_H_MAJ(skb->priority) != sch->handle) {
 		fl = rcu_dereference_bh(q->filter_list);
 		err = tcf_classify(skb, NULL, fl, &res, false);
+		if (err == TC_ACT_SHOT || res.errcode)
+			return NULL;
 #ifdef CONFIG_NET_CLS_ACT
 		switch (err) {
 		case TC_ACT_STOLEN:
 		case TC_ACT_QUEUED:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return NULL;
 		}
 #endif
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 8c4fee063436..8b8edee013d8 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -92,6 +92,8 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	result = tcf_classify(skb, NULL, filter, &res, false);
+	if (result == TC_ACT_SHOT || res.errcode)
+		return 0;
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -99,8 +101,6 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 		case TC_ACT_QUEUED:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return 0;
 		}
 #endif
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 6980796d435d..309a6f66c2ba 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -81,7 +81,7 @@ static unsigned int fq_pie_classify(struct sk_buff *skb, struct Qdisc *sch,
 {
 	struct fq_pie_sched_data *q = qdisc_priv(sch);
 	struct tcf_proto *filter;
-	struct tcf_result res;
+	struct tcf_result res = { };
 	int result;
 
 	if (TC_H_MAJ(skb->priority) == sch->handle &&
@@ -95,6 +95,8 @@ static unsigned int fq_pie_classify(struct sk_buff *skb, struct Qdisc *sch,
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	result = tcf_classify(skb, NULL, filter, &res, false);
+	if (result == TC_ACT_SHOT || res.errcode)
+		return 0;
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -102,8 +104,6 @@ static unsigned int fq_pie_classify(struct sk_buff *skb, struct Qdisc *sch,
 		case TC_ACT_QUEUED:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return 0;
 		}
 #endif
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 70b0c5873d32..a11c99726bae 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1116,7 +1116,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 {
 	struct hfsc_sched *q = qdisc_priv(sch);
 	struct hfsc_class *head, *cl;
-	struct tcf_result res;
+	struct tcf_result res = { };
 	struct tcf_proto *tcf;
 	int result;
 
@@ -1129,14 +1129,14 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	head = &q->root;
 	tcf = rcu_dereference_bh(q->root.filter_list);
 	while (tcf && (result = tcf_classify(skb, NULL, tcf, &res, false)) >= 0) {
+		if (result == TC_ACT_SHOT || res.errcode)
+			return NULL;
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return NULL;
 		}
 #endif
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index e5b4bbf3ce3d..8a2dd9a8cfe5 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -218,7 +218,7 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 {
 	struct htb_sched *q = qdisc_priv(sch);
 	struct htb_class *cl;
-	struct tcf_result res;
+	struct tcf_result res = { };
 	struct tcf_proto *tcf;
 	int result;
 
@@ -240,14 +240,15 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	while (tcf && (result = tcf_classify(skb, NULL, tcf, &res, false)) >= 0) {
+
+		if (result == TC_ACT_SHOT || res.errcode)
+			return NULL;
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return NULL;
 		}
 #endif
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 75c9c860182b..3267d10e8ad4 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -30,21 +30,21 @@ static struct Qdisc *
 multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 {
 	struct multiq_sched_data *q = qdisc_priv(sch);
-	u32 band;
-	struct tcf_result res;
 	struct tcf_proto *fl = rcu_dereference_bh(q->filter_list);
+	struct tcf_result res = {};
+	u32 band;
 	int err;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	err = tcf_classify(skb, NULL, fl, &res, false);
+	if (err == TC_ACT_SHOT || res.errcode)
+		return NULL;
 #ifdef CONFIG_NET_CLS_ACT
 	switch (err) {
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
 	case TC_ACT_TRAP:
 		*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-		fallthrough;
-	case TC_ACT_SHOT:
 		return NULL;
 	}
 #endif
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index fdc5ef52c3ee..a934fbe59b29 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -32,7 +32,7 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
 	u32 band = skb->priority;
-	struct tcf_result res;
+	struct tcf_result res = { };
 	struct tcf_proto *fl;
 	int err;
 
@@ -40,14 +40,15 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	if (TC_H_MAJ(skb->priority) != sch->handle) {
 		fl = rcu_dereference_bh(q->filter_list);
 		err = tcf_classify(skb, NULL, fl, &res, false);
+		if (err == TC_ACT_SHOT || res.errcode)
+			return NULL;
+
 #ifdef CONFIG_NET_CLS_ACT
 		switch (err) {
 		case TC_ACT_STOLEN:
 		case TC_ACT_QUEUED:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return NULL;
 		}
 #endif
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index cf5ebe43b3b4..0c23c7a1e76d 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -670,7 +670,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 {
 	struct qfq_sched *q = qdisc_priv(sch);
 	struct qfq_class *cl;
-	struct tcf_result res;
+	struct tcf_result res = { };
 	struct tcf_proto *fl;
 	int result;
 
@@ -684,6 +684,8 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	fl = rcu_dereference_bh(q->filter_list);
 	result = tcf_classify(skb, NULL, fl, &res, false);
+	if (result == TC_ACT_SHOT || res.errcode)
+		return NULL;
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -691,8 +693,6 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 		case TC_ACT_STOLEN:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return NULL;
 		}
 #endif
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 1871a1c0224d..7da90c7b0bec 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -254,10 +254,12 @@ static bool sfb_rate_limit(struct sk_buff *skb, struct sfb_sched_data *q)
 static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
 			 int *qerr, u32 *salt)
 {
-	struct tcf_result res;
+	struct tcf_result res = { };
 	int result;
 
 	result = tcf_classify(skb, NULL, fl, &res, false);
+	if (result == TC_ACT_SHOT || res.errcode)
+		return false;
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -265,9 +267,7 @@ static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
 		case TC_ACT_QUEUED:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
-			return false;
+			return NULL;
 		}
 #endif
 		*salt = TC_H_MIN(res.classid);
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index abd436307d6a..23befa90519c 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -164,7 +164,7 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 				 int *qerr)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	struct tcf_result res;
+	struct tcf_result res = { };
 	struct tcf_proto *fl;
 	int result;
 
@@ -179,6 +179,8 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	result = tcf_classify(skb, NULL, fl, &res, false);
+	if (result == TC_ACT_SHOT || res.errcode)
+		return 0;
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
@@ -186,8 +188,6 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 		case TC_ACT_QUEUED:
 		case TC_ACT_TRAP:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-			fallthrough;
-		case TC_ACT_SHOT:
 			return 0;
 		}
 #endif

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-18  0:10     ` Kyle Zeng
  2023-01-18 11:06       ` Davide Caratti
@ 2023-01-19  3:41       ` shaozhengchao
  1 sibling, 0 replies; 12+ messages in thread
From: shaozhengchao @ 2023-01-19  3:41 UTC (permalink / raw)
  To: Kyle Zeng, Jamal Hadi Salim
  Cc: David Miller, Sasha Levin, Cong Wang, Jiri Pirko, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, Davide Caratti,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers


Thank you for your answer.

On 2023/1/18 8:10, Kyle Zeng wrote:
> Hi Zhengchao,
> 
> I'm the finder of the vulnerability. In my initial report, there was a
> more detailed explanation of why this bug happened. But it got left
> out in the commit message.
> So, I'll explain it here and see whether people want to patch the
> actual root cause of the crash.
> 
> The underlying bug that this patch was trying to address is actually
> in `__tcf_classify`. Notice that `struct tcf_result` is actually a
> union type, so whenever the kernel sets res.goto_tp, it also sets
> res.class. And this can happen inside `tcf_action_goto_chain_exec`. In
> other words, `tcf_action_goto_chain_exec` will set res.class. Notice
> that goto_chain can point back to itself, which causes an infinite
> loop. To avoid the infinite loop situation, `__tcf_classify` checks
> how many times the loop has been executed
> (https://elixir.bootlin.com/linux/v6.1/source/net/sched/cls_api.c#L1586),
> if it is more than a specific number, it will mark the result as
> TC_ACT_SHOT and then return:
> 
> if (unlikely(limit++ >= max_reclassify_loop)) {
>      ...
>      return TC_ACT_SHOT;
> }
> 
> However, when it returns in the infinite loop handler, it forgets to
> clear whatever is in the `res` variable, which still holds pointers in
> `goto_tp`. As a result, cbq_classify will think it is a valid
> `res.class` and causes type confusion.

It's very meaningful for me to understand that patch. I think I've
missed the path where there's a problem when I analyze the patch.
> 
> My initial proposed patch was to memset `res` before `return
> TC_ACT_SHOT` in `__tcf_classify`, but it didn't get merged. But I
> guess the merged patch is more generic.
> 
> BTW, I'm not sure whether it is a bug or it is intended in the merged
> patch for this bug: moving `return NULL` statement in `cbq_classify`
> results in a behavior change that is not documented anywhere:
> previously, packets that return TC_ACT_QUEUED, TC_ACT_STOLEN, and
> TC_ACT_TRAP will eventually return NULL, but now they will be passed
> into `cbq_reclassify`. Is this expected?
> 
> Best,
> Kyle Zeng
> 
> On Tue, Jan 17, 2023 at 12:07 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>
>> +Cc netdev
>>
>> On Tue, Jan 17, 2023 at 10:06 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>>
>>> Trimmed Cc (+Davide).
>>>
>>> I am not sure i followed what you are saying because i dont see the
>>> relationship between the
>>> two commits. Did that patch(9410c9409d3e) cause a problem?
>>> How do you reproduce the issue that is caused by this patch that you are seeing?
>>>
>>> One of the challenges we have built over time is consumers of classification and
>>> action execution may not be fully conserving the semantics of the return code.
>>> The return code is a "verdict" on what happened; a safer approach is to get the
>>> return code to be either an error/success code. But that seems to be a
>>> separate issue.
>>>
>>> cheers,
>>> jamal
>>>
>>> On Mon, Jan 16, 2023 at 3:28 AM shaozhengchao <shaozhengchao@huawei.com> wrote:
>>>>
>>>> When I analyzed the following LTS 5.10 patch, I had a small question:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=b2c917e510e5ddbc7896329c87d20036c8b82952
>>>>
>>>> As described in this patch, res is obtained through the tcf_classify()
>>>> interface. If result is TC_ACT_SHOT, res may be an abnormal value.
>>>> Accessing class in res will cause abnormal access.
>>>>
>>>> For LTS version 5.10, if tcf_classify() is to return a positive value,
>>>> the classify hook function to the filter must be called, and the hook
>>>> function returns a positive number. Observe the classify function of
>>>> each filter. Generally, res is initialized in four scenarios.
>>>> 1. res is assigned a value by res in the private member of each filter.
>>>> Generally, kzalloc is used to assign initial values to res of various
>>>> filters. Therefore, class in res is initialized to 0. Then use the
>>>> tcf_bind_filter() interface to assign values to members in res.
>>>> Therefore, value of class is assigned. For example, cls_basic.
>>>> 2. The classify function of the filter directly assigns a value to the
>>>> class of res, for example, cls_cgroup.
>>>> 3. The filter classify function references tp and assigns a value to
>>>> res, for example, cls_u32.
>>>> 4. The change function of the filter references fh and assigns a value
>>>> to class in res, for example, cls_rsvp.
>>>>
>>>> This Mainline problem is caused by commit:3aa260559455 (" net/sched:
>>>> store the last executed chain also for clsact egress") and
>>>> commit:9410c9409d3e ("net: sched: Introduce ingress classification
>>>> function"). I don't know if my analysis is correct, please help correct,
>>>> thank you very much.
>>>>
>>>> Zhengchao Shao

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-18 13:06         ` Jamal Hadi Salim
  2023-01-18 16:00           ` Davide Caratti
@ 2023-01-19  6:33           ` shaozhengchao
  2023-01-19  7:00             ` shaozhengchao
  1 sibling, 1 reply; 12+ messages in thread
From: shaozhengchao @ 2023-01-19  6:33 UTC (permalink / raw)
  To: Jamal Hadi Salim, Davide Caratti
  Cc: Kyle Zeng, David Miller, Sasha Levin, Cong Wang, Jiri Pirko,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

Hi Jamal:

On 2023/1/18 21:06, Jamal Hadi Salim wrote:
> The reproducer the Kyle included back then was not useful - it seemed
> like a cutnpaste
> from some syzkaller dashboard (and didnt compile); however, for this one issue,
> you can reproduce the problem by creating the infinite loop setup that
> Davide describes.
> 
> The main issue is bigger than tcf_classify: It has to do with
> interpretation of tcf_result
> and the return codes.
> I reviewed all consumers of tcf_results and only 3 (all happened to be qdiscs)
> were fixed in that patch set. Note consumers include all objects in
> the hierarchy
> including classifiers and action.
> 
> Typically, the LinuxWay(tm) of cutting and pasting what other people before you
> did works - but sometimes people forget environmental rules even when they are
> documented. The main environmental rule that was at stake here is the return
> (verdict) code said to drop the packet. The validity of tcf_result in
> such a case is
> questionable and setting it to 0 was irrelevant. So that is all the
> fix had to do for -net.
> 
> The current return code is a "verdict" on what happened. Given that
> there is potential
> to misinterpret - as was seen here - a safer approach is to get the
> return code to be either
> an error/success code(eg ELOOP for the example being quoted) since
> that is a more
> common pattern and we store the "verdict code" in tcf_result (TC_ACT_SHOT).
> I was planning to send an RFC patch for that.
> 
> I am still not clear on the correlation that Zhengchao Shao was making between
> Davide's patch and this issue...
> 
I'm just looking for the specific possible root cause of the issue.
Please help check whether the possible causes are as follows:
1. __tcf_classify returns TC_ACT_UNSPEC,tc_skb_ext_alloc allocation 
failure, and the res may be abnormal. Maybe fix commit:9410c9409d3e
("net: sched: Introduce ingress classification function")
2.tcf_chain_lookup_rcu return NULL,and tcf_classify will return 
TC_ACT_SHOT. In this way, res is abnormal. Oh, I am sorry. In
cbq_classify, pass NULL as block. So tcf_chain_lookup_rcu will not be
called in tcf_classify. Ignore this.

Thank you again for your careful answer.

Zhengchao Shao
> cheers,
> jamal
> 
> 
> On Wed, Jan 18, 2023 at 6:06 AM Davide Caratti <dcaratti@redhat.com> wrote:
>>
>> hello,
>>
>> On Tue, Jan 17, 2023 at 05:10:58PM -0700, Kyle Zeng wrote:
>>> Hi Zhengchao,
>>>
>>> I'm the finder of the vulnerability. In my initial report, there was a
>>> more detailed explanation of why this bug happened. But it got left
>>> out in the commit message.
>>> So, I'll explain it here and see whether people want to patch the
>>> actual root cause of the crash.
>>>
>>> The underlying bug that this patch was trying to address is actually
>>> in `__tcf_classify`. Notice that `struct tcf_result` is actually a
>>> union type, so whenever the kernel sets res.goto_tp, it also sets
>>> res.class.
>>
>>  From what I see/remember, 'res' (struct tcf_result) is unassigned
>> unless the packet is matched by a classifier (i.e. it does not return
>> TC_ACT_UNSPEC).
>>
>> When this match happens (__tcf_classify returns non-negative) and the
>> control action says TC_ACT_GOTO_CHAIN, res->goto_tp is written.
>> Like you say, 'res.class' is written as well because it's a union.
>>
>>> And this can happen inside `tcf_action_goto_chain_exec`. In
>>> other words, `tcf_action_goto_chain_exec` will set res.class. Notice
>>> that goto_chain can point back to itself, which causes an infinite
>>> loop. To avoid the infinite loop situation, `__tcf_classify` checks
>>> how many times the loop has been executed
>>> (https://elixir.bootlin.com/linux/v6.1/source/net/sched/cls_api.c#L1586),
>>> if it is more than a specific number, it will mark the result as
>>> TC_ACT_SHOT and then return:
>>>
>>> if (unlikely(limit++ >= max_reclassify_loop)) {
>>>      ...
>>>      return TC_ACT_SHOT;
>>> }
>>
>> maybe there is an easier reproducer, something made of 2 TC actions.
>> The first one goes to a valid chain, and then the second one (executed from
>> within the chain) drops the packet. I think that unpatched CBQ scheduler
>> will find 'res.class' with a value also there.
>>
>>> However, when it returns in the infinite loop handler, it forgets to
>>> clear whatever is in the `res` variable, which still holds pointers in
>>> `goto_tp`. As a result, cbq_classify will think it is a valid
>>> `res.class` and causes type confusion.
>>>
>>> My initial proposed patch was to memset `res` before `return
>>> TC_ACT_SHOT` in `__tcf_classify`, but it didn't get merged. But I
>>> guess the merged patch is more generic.
>>
>> The merged patch looks good to me; however, I wonder if it's sufficient.
>> If I well read the code, there is still the possibility of hitting the
>> same problem on a patched kernel when TC_ACT_TRAP / TC_ACT_STOLEN is
>> returned after a 'goto chain' when the qdisc is CBQ.
>>
>> I like Jamal's idea of sharing the reproducer :)
> 
> 
>> thanks!
>> --
>> davide
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-19  6:33           ` shaozhengchao
@ 2023-01-19  7:00             ` shaozhengchao
  0 siblings, 0 replies; 12+ messages in thread
From: shaozhengchao @ 2023-01-19  7:00 UTC (permalink / raw)
  To: Jamal Hadi Salim, Davide Caratti
  Cc: Kyle Zeng, David Miller, Sasha Levin, Cong Wang, Jiri Pirko,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers



On 2023/1/19 14:33, shaozhengchao wrote:
> Hi Jamal:
> 
> On 2023/1/18 21:06, Jamal Hadi Salim wrote:
>> The reproducer the Kyle included back then was not useful - it seemed
>> like a cutnpaste
>> from some syzkaller dashboard (and didnt compile); however, for this 
>> one issue,
>> you can reproduce the problem by creating the infinite loop setup that
>> Davide describes.
>>
>> The main issue is bigger than tcf_classify: It has to do with
>> interpretation of tcf_result
>> and the return codes.
>> I reviewed all consumers of tcf_results and only 3 (all happened to be 
>> qdiscs)
>> were fixed in that patch set. Note consumers include all objects in
>> the hierarchy
>> including classifiers and action.
>>
>> Typically, the LinuxWay(tm) of cutting and pasting what other people 
>> before you
>> did works - but sometimes people forget environmental rules even when 
>> they are
>> documented. The main environmental rule that was at stake here is the 
>> return
>> (verdict) code said to drop the packet. The validity of tcf_result in
>> such a case is
>> questionable and setting it to 0 was irrelevant. So that is all the
>> fix had to do for -net.
>>
>> The current return code is a "verdict" on what happened. Given that
>> there is potential
>> to misinterpret - as was seen here - a safer approach is to get the
>> return code to be either
>> an error/success code(eg ELOOP for the example being quoted) since
>> that is a more
>> common pattern and we store the "verdict code" in tcf_result 
>> (TC_ACT_SHOT).
>> I was planning to send an RFC patch for that.
>>
>> I am still not clear on the correlation that Zhengchao Shao was making 
>> between
>> Davide's patch and this issue...
>>
> I'm just looking for the specific possible root cause of the issue.
> Please help check whether the possible causes are as follows:
> 1. __tcf_classify returns TC_ACT_UNSPEC,tc_skb_ext_alloc allocation 
> failure, and the res may be abnormal. Maybe fix commit:9410c9409d3e
> ("net: sched: Introduce ingress classification function")
> 2.tcf_chain_lookup_rcu return NULL,and tcf_classify will return 
> TC_ACT_SHOT. In this way, res is abnormal. Oh, I am sorry. In
> cbq_classify, pass NULL as block. So tcf_chain_lookup_rcu will not be
> called in tcf_classify. Ignore this.
> 
> Thank you again for your careful answer.
> 
> Zhengchao Shao

Last I just wonder whether the fix tag of that patch is corret?
whether the issue could be triggered by other paths except Kyle
describe.
How do you know it's fixed Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2").

Thank you.

>> cheers,
>> jamal
>>
>>
>> On Wed, Jan 18, 2023 at 6:06 AM Davide Caratti <dcaratti@redhat.com> 
>> wrote:
>>>
>>> hello,
>>>
>>> On Tue, Jan 17, 2023 at 05:10:58PM -0700, Kyle Zeng wrote:
>>>> Hi Zhengchao,
>>>>
>>>> I'm the finder of the vulnerability. In my initial report, there was a
>>>> more detailed explanation of why this bug happened. But it got left
>>>> out in the commit message.
>>>> So, I'll explain it here and see whether people want to patch the
>>>> actual root cause of the crash.
>>>>
>>>> The underlying bug that this patch was trying to address is actually
>>>> in `__tcf_classify`. Notice that `struct tcf_result` is actually a
>>>> union type, so whenever the kernel sets res.goto_tp, it also sets
>>>> res.class.
>>>
>>>  From what I see/remember, 'res' (struct tcf_result) is unassigned
>>> unless the packet is matched by a classifier (i.e. it does not return
>>> TC_ACT_UNSPEC).
>>>
>>> When this match happens (__tcf_classify returns non-negative) and the
>>> control action says TC_ACT_GOTO_CHAIN, res->goto_tp is written.
>>> Like you say, 'res.class' is written as well because it's a union.
>>>
>>>> And this can happen inside `tcf_action_goto_chain_exec`. In
>>>> other words, `tcf_action_goto_chain_exec` will set res.class. Notice
>>>> that goto_chain can point back to itself, which causes an infinite
>>>> loop. To avoid the infinite loop situation, `__tcf_classify` checks
>>>> how many times the loop has been executed
>>>> (https://elixir.bootlin.com/linux/v6.1/source/net/sched/cls_api.c#L1586),
>>>> if it is more than a specific number, it will mark the result as
>>>> TC_ACT_SHOT and then return:
>>>>
>>>> if (unlikely(limit++ >= max_reclassify_loop)) {
>>>>      ...
>>>>      return TC_ACT_SHOT;
>>>> }
>>>
>>> maybe there is an easier reproducer, something made of 2 TC actions.
>>> The first one goes to a valid chain, and then the second one 
>>> (executed from
>>> within the chain) drops the packet. I think that unpatched CBQ scheduler
>>> will find 'res.class' with a value also there.
>>>
>>>> However, when it returns in the infinite loop handler, it forgets to
>>>> clear whatever is in the `res` variable, which still holds pointers in
>>>> `goto_tp`. As a result, cbq_classify will think it is a valid
>>>> `res.class` and causes type confusion.
>>>>
>>>> My initial proposed patch was to memset `res` before `return
>>>> TC_ACT_SHOT` in `__tcf_classify`, but it didn't get merged. But I
>>>> guess the merged patch is more generic.
>>>
>>> The merged patch looks good to me; however, I wonder if it's sufficient.
>>> If I well read the code, there is still the possibility of hitting the
>>> same problem on a patched kernel when TC_ACT_TRAP / TC_ACT_STOLEN is
>>> returned after a 'goto chain' when the qdisc is CBQ.
>>>
>>> I like Jamal's idea of sharing the reproducer :)
>>
>>
>>> thanks!
>>> -- 
>>> davide
>>>
>>>
>>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-18 17:00             ` Jamal Hadi Salim
@ 2023-01-19 16:34               ` Davide Caratti
  2023-01-23 19:50                 ` Jamal Hadi Salim
  0 siblings, 1 reply; 12+ messages in thread
From: Davide Caratti @ 2023-01-19 16:34 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Kyle Zeng, shaozhengchao, David Miller, Sasha Levin, Cong Wang,
	Jiri Pirko, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

On Wed, Jan 18, 2023 at 12:00:57PM -0500, Jamal Hadi Salim wrote:

> On Wed, Jan 18, 2023 at 11:00 AM Davide Caratti <dcaratti@redhat.com> wrote:

[...] 

> > With HTB, CBQ, DRR, HFSC, QFQ and ATM it's possible that the TC classifier
> > selects a traffic class for a given packet and writes that pointer in 'res.class'.
> 
> It's their choice of how to implement.

[...] 

> If all goes well, it is a good pointer.

Oh, probably now I see - the assignment in .bind_class() does the magic.

[...]

> > well, the implementation of "goto_chain" actually abuses tcf_result:
> > since it's going to pass the packet to another classifier, it
> > temporarily stores a handle to the next filter in the tcf_result -
> > instead of passing it through the stack. That is not a problem, unless
> > a packet hits the max_reclassify_loop and a CBQ qdisc that dereferences
> > 'res.class' even with a packet drop :)
> >
> 
> The rule is tcf_results should be returning the results and execution
> state to the caller. The goto_chain maybe ok in this case.

it looks ok because the chain is another classifier that re-writes tcf_result
with its own res.class + res.classid. Maybe we should assess what happens
when no classifier matches the packet after a goto_chain (i.e. let's check
if res.class still keeps a pointer to struct tcf_proto). However, tcf_classify()
returns -1 (TC_ACT_UNSPECT) in this case: CBQ and other qdiscs already
ignore tcf_result here.
 
> BTW, I did create a patch initially when this issue surfaced but we
> needed to get something to net first. See attached. The proper way to
> do this is to have the small surgery that returns the errcode instead
> of verdict code and store TC_ACT_XXX in tcf_result (in place of errcode).

my 2 cents: I really would like to see a different skb drop reason for these
packets (currently they are accounted as SKB_DROP_REASON_QDISC_DROP like
regular qdisc drops [1][2]). 

thanks!

[1] https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L3886
[2] https://lore.kernel.org/netdev/Yt2CIl7iCoahCPoU@pop-os.localdomain/ 

-- 
davide


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10
  2023-01-19 16:34               ` Davide Caratti
@ 2023-01-23 19:50                 ` Jamal Hadi Salim
  0 siblings, 0 replies; 12+ messages in thread
From: Jamal Hadi Salim @ 2023-01-23 19:50 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Kyle Zeng, shaozhengchao, David Miller, Sasha Levin, Cong Wang,
	Jiri Pirko, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Marcelo Ricardo Leitner, Linux Kernel Network Developers

On Thu, Jan 19, 2023 at 11:34 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> On Wed, Jan 18, 2023 at 12:00:57PM -0500, Jamal Hadi Salim wrote:
>
> > On Wed, Jan 18, 2023 at 11:00 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> [...]
>
> > > With HTB, CBQ, DRR, HFSC, QFQ and ATM it's possible that the TC classifier
> > > selects a traffic class for a given packet and writes that pointer in 'res.class'.
> >
> > It's their choice of how to implement.
>
> [...]
>
> > If all goes well, it is a good pointer.
>
> Oh, probably now I see - the assignment in .bind_class() does the magic.
>
> [...]
>
> > > well, the implementation of "goto_chain" actually abuses tcf_result:
> > > since it's going to pass the packet to another classifier, it
> > > temporarily stores a handle to the next filter in the tcf_result -
> > > instead of passing it through the stack. That is not a problem, unless
> > > a packet hits the max_reclassify_loop and a CBQ qdisc that dereferences
> > > 'res.class' even with a packet drop :)
> > >
> >
> > The rule is tcf_results should be returning the results and execution
> > state to the caller. The goto_chain maybe ok in this case.
>
> it looks ok because the chain is another classifier that re-writes tcf_result
> with its own res.class + res.classid. Maybe we should assess what happens
> when no classifier matches the packet after a goto_chain (i.e. let's check
> if res.class still keeps a pointer to struct tcf_proto). However, tcf_classify()
> returns -1 (TC_ACT_UNSPECT) in this case: CBQ and other qdiscs already
> ignore tcf_result here.
>

Yes.

> > BTW, I did create a patch initially when this issue surfaced but we
> > needed to get something to net first. See attached. The proper way to
> > do this is to have the small surgery that returns the errcode instead
> > of verdict code and store TC_ACT_XXX in tcf_result (in place of errcode).
>
> my 2 cents: I really would like to see a different skb drop reason for these
> packets (currently they are accounted as SKB_DROP_REASON_QDISC_DROP like
> regular qdisc drops [1][2]).
>

I should make it clear, that is not the patch that needs to move
forward; it was just
a quick experiment.
I think we can add a lot more fine grained details on the result such
as what you
described above.
I am pre-occupied elsewhere but will get it to it probably by coming weekend..

cheers,
jamal

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-23 19:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4538d7d2-0d43-16b7-9f80-77355f08cc61@huawei.com>
2023-01-17  1:03 ` Question: Patch:("net: sched: cbq: dont intepret cls results when asked to drop") may be not bug for branch LTS 5.10 shaozhengchao
     [not found] ` <CAM0EoM=rqF8K997AmC0VDncJ9LeA0PJku2BL96iiatAOiv1-vw@mail.gmail.com>
2023-01-17 19:07   ` Jamal Hadi Salim
2023-01-18  0:10     ` Kyle Zeng
2023-01-18 11:06       ` Davide Caratti
2023-01-18 13:06         ` Jamal Hadi Salim
2023-01-18 16:00           ` Davide Caratti
2023-01-18 17:00             ` Jamal Hadi Salim
2023-01-19 16:34               ` Davide Caratti
2023-01-23 19:50                 ` Jamal Hadi Salim
2023-01-19  6:33           ` shaozhengchao
2023-01-19  7:00             ` shaozhengchao
2023-01-19  3:41       ` shaozhengchao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).