linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net 0/2] Fixes for tc-taprio software mode
@ 2022-09-15 10:08 Vladimir Oltean
  2022-09-15 10:08 ` [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled Vladimir Oltean
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Vladimir Oltean @ 2022-09-15 10:08 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Voon Weifeng, Vladimir Oltean, Kurt Kanzenbach, linux-kernel

While working on some new features for tc-taprio, I found some strange
behavior which looked like bugs. I was able to eventually trigger a NULL
pointer dereference. This patch set fixes 2 issues I saw. Detailed
explanation in patches.

Changes in v2: dropped patch 3/3 (will resend to net-next).

Vladimir Oltean (2):
  net/sched: taprio: avoid disabling offload when it was never enabled
  net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo
    child qdiscs

 net/sched/sch_taprio.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled
  2022-09-15 10:08 [PATCH v2 net 0/2] Fixes for tc-taprio software mode Vladimir Oltean
@ 2022-09-15 10:08 ` Vladimir Oltean
  2022-09-15 21:52   ` Vinicius Costa Gomes
  2022-09-15 10:08 ` [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs Vladimir Oltean
  2022-09-20 19:00 ` [PATCH v2 net 0/2] Fixes for tc-taprio software mode patchwork-bot+netdevbpf
  2 siblings, 1 reply; 10+ messages in thread
From: Vladimir Oltean @ 2022-09-15 10:08 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Voon Weifeng, Vladimir Oltean, Kurt Kanzenbach, linux-kernel

In an incredibly strange API design decision, qdisc->destroy() gets
called even if qdisc->init() never succeeded, not exclusively since
commit 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),
but apparently also earlier (in the case of qdisc_create_dflt()).

The taprio qdisc does not fully acknowledge this when it attempts full
offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
parsed from netlink (in taprio_change(), tail called from taprio_init()).

But in taprio_destroy(), we call taprio_disable_offload(), and this
determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).

But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
(a bitwise check of bit 1 in q->flags), it is invalid to call this macro
on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
an invalid set of flags.

As a result, it is possible to crash the kernel if user space forces an
error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
of taprio_enable_offload(). This is because drivers do not expect the
offload to be disabled when it was never enabled.

The error that we force here is to attach taprio as a non-root qdisc,
but instead as child of an mqprio root qdisc:

$ tc qdisc add dev swp0 root handle 1: \
	mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:1 \
	taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
	sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
Unable to handle kernel paging request at virtual address fffffffffffffff8
[fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Call trace:
 taprio_dump+0x27c/0x310
 vsc9959_port_setup_tc+0x1f4/0x460
 felix_port_setup_tc+0x24/0x3c
 dsa_slave_setup_tc+0x54/0x27c
 taprio_disable_offload.isra.0+0x58/0xe0
 taprio_destroy+0x80/0x104
 qdisc_create+0x240/0x470
 tc_modify_qdisc+0x1fc/0x6b0
 rtnetlink_rcv_msg+0x12c/0x390
 netlink_rcv_skb+0x5c/0x130
 rtnetlink_rcv+0x1c/0x2c

Fix this by keeping track of the operations we made, and undo the
offload only if we actually did it.

I've added "bool offloaded" inside a 4 byte hole between "int clockid"
and "atomic64_t picos_per_byte". Now the first cache line looks like
below:

$ pahole -C taprio_sched net/sched/sch_taprio.o
struct taprio_sched {
        struct Qdisc * *           qdiscs;               /*     0     8 */
        struct Qdisc *             root;                 /*     8     8 */
        u32                        flags;                /*    16     4 */
        enum tk_offsets            tk_offset;            /*    20     4 */
        int                        clockid;              /*    24     4 */
        bool                       offloaded;            /*    28     1 */

        /* XXX 3 bytes hole, try to pack */

        atomic64_t                 picos_per_byte;       /*    32     0 */

        /* XXX 8 bytes hole, try to pack */

        spinlock_t                 current_entry_lock;   /*    40     0 */

        /* XXX 8 bytes hole, try to pack */

        struct sched_entry *       current_entry;        /*    48     8 */
        struct sched_gate_list *   oper_sched;           /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */

Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1->v2: none

 net/sched/sch_taprio.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index db88a692ef81..a3b4f92a9937 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -67,6 +67,7 @@ struct taprio_sched {
 	u32 flags;
 	enum tk_offsets tk_offset;
 	int clockid;
+	bool offloaded;
 	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
 				    * speeds it's sub-nanoseconds per byte
 				    */
@@ -1279,6 +1280,8 @@ static int taprio_enable_offload(struct net_device *dev,
 		goto done;
 	}
 
+	q->offloaded = true;
+
 done:
 	taprio_offload_free(offload);
 
@@ -1293,12 +1296,9 @@ static int taprio_disable_offload(struct net_device *dev,
 	struct tc_taprio_qopt_offload *offload;
 	int err;
 
-	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
+	if (!q->offloaded)
 		return 0;
 
-	if (!ops->ndo_setup_tc)
-		return -EOPNOTSUPP;
-
 	offload = taprio_offload_alloc(0);
 	if (!offload) {
 		NL_SET_ERR_MSG(extack,
@@ -1314,6 +1314,8 @@ static int taprio_disable_offload(struct net_device *dev,
 		goto out;
 	}
 
+	q->offloaded = false;
+
 out:
 	taprio_offload_free(offload);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
  2022-09-15 10:08 [PATCH v2 net 0/2] Fixes for tc-taprio software mode Vladimir Oltean
  2022-09-15 10:08 ` [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled Vladimir Oltean
@ 2022-09-15 10:08 ` Vladimir Oltean
  2022-09-15 21:54   ` Vinicius Costa Gomes
  2023-02-22 14:03   ` Kurt Kanzenbach
  2022-09-20 19:00 ` [PATCH v2 net 0/2] Fixes for tc-taprio software mode patchwork-bot+netdevbpf
  2 siblings, 2 replies; 10+ messages in thread
From: Vladimir Oltean @ 2022-09-15 10:08 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Voon Weifeng, Vladimir Oltean, Kurt Kanzenbach, linux-kernel

taprio can only operate as root qdisc, and to that end, there exists the
following check in taprio_init(), just as in mqprio:

	if (sch->parent != TC_H_ROOT)
		return -EOPNOTSUPP;

And indeed, when we try to attach taprio to an mqprio child, it fails as
expected:

$ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
Error: sch_taprio: Can only be attached as root qdisc.

(extack message added by me)

But when we try to attach a taprio child to a taprio root qdisc,
surprisingly it doesn't fail:

$ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI

This is because tc_modify_qdisc() behaves differently when mqprio is
root, vs when taprio is root.

In the mqprio case, it finds the parent qdisc through
p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
ignored according to the comment right below ("It may be default qdisc,
ignore it"). As a result, tc_modify_qdisc() goes through the
qdisc_create() code path, and this gives taprio_init() a chance to check
for sch_parent != TC_H_ROOT and error out.

Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
different. It is not the default qdisc created for each netdev queue
(both taprio and mqprio call qdisc_create_dflt() and keep them in
a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
makes qdisc_leaf() return the _root_ qdisc, aka itself.

When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
code path, because the qdisc layer never finds out about the child qdisc
of the root. And through the ->change() ops, taprio has no reason to
check whether its parent is root or not, just through ->init(), which is
not called.

The problem is the taprio_leaf() implementation. Even though code wise,
it does the exact same thing as mqprio_leaf() which it is copied from,
it works with different input data. This is because mqprio does not
attach itself (the root) to each device TX queue, but one of the default
qdiscs from its private array.

In fact, since commit 13511704f8d7 ("net: taprio offload: enforce qdisc
to netdev queue mapping"), taprio does this too, but just for the full
offload case. So if we tried to attach a taprio child to a fully
offloaded taprio root qdisc, it would properly fail too; just not to a
software root taprio.

To fix the problem, stop looking at the Qdisc that's attached to the TX
queue, and instead, always return the default qdiscs that we've
allocated (and to which we privately enqueue and dequeue, in software
scheduling mode).

Since Qdisc_class_ops :: leaf  is only called from tc_modify_qdisc(),
the risk of unforeseen side effects introduced by this change is
minimal.

Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
v1->v2: none

 net/sched/sch_taprio.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index a3b4f92a9937..5bffc37022e0 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1949,12 +1949,14 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 static struct Qdisc *taprio_leaf(struct Qdisc *sch, unsigned long cl)
 {
-	struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
+	struct taprio_sched *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	unsigned int ntx = cl - 1;
 
-	if (!dev_queue)
+	if (ntx >= dev->num_tx_queues)
 		return NULL;
 
-	return dev_queue->qdisc_sleeping;
+	return q->qdiscs[ntx];
 }
 
 static unsigned long taprio_find(struct Qdisc *sch, u32 classid)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled
  2022-09-15 10:08 ` [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled Vladimir Oltean
@ 2022-09-15 21:52   ` Vinicius Costa Gomes
  0 siblings, 0 replies; 10+ messages in thread
From: Vinicius Costa Gomes @ 2022-09-15 21:52 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Voon Weifeng,
	Vladimir Oltean, Kurt Kanzenbach, linux-kernel

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> In an incredibly strange API design decision, qdisc->destroy() gets
> called even if qdisc->init() never succeeded, not exclusively since
> commit 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation"),
> but apparently also earlier (in the case of qdisc_create_dflt()).
>
> The taprio qdisc does not fully acknowledge this when it attempts full
> offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
> taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
> parsed from netlink (in taprio_change(), tail called from taprio_init()).
>
> But in taprio_destroy(), we call taprio_disable_offload(), and this
> determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).
>
> But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
> (a bitwise check of bit 1 in q->flags), it is invalid to call this macro
> on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
> to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
> an invalid set of flags.
>
> As a result, it is possible to crash the kernel if user space forces an
> error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
> of taprio_enable_offload(). This is because drivers do not expect the
> offload to be disabled when it was never enabled.
>
> The error that we force here is to attach taprio as a non-root qdisc,
> but instead as child of an mqprio root qdisc:
>
> $ tc qdisc add dev swp0 root handle 1: \
> 	mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
> 	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
> $ tc qdisc replace dev swp0 parent 1:1 \
> 	taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
> 	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
> 	sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> 	flags 0x0 clockid CLOCK_TAI
> Unable to handle kernel paging request at virtual address fffffffffffffff8
> [fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 96000004 [#1] PREEMPT SMP
> Call trace:
>  taprio_dump+0x27c/0x310
>  vsc9959_port_setup_tc+0x1f4/0x460
>  felix_port_setup_tc+0x24/0x3c
>  dsa_slave_setup_tc+0x54/0x27c
>  taprio_disable_offload.isra.0+0x58/0xe0
>  taprio_destroy+0x80/0x104
>  qdisc_create+0x240/0x470
>  tc_modify_qdisc+0x1fc/0x6b0
>  rtnetlink_rcv_msg+0x12c/0x390
>  netlink_rcv_skb+0x5c/0x130
>  rtnetlink_rcv+0x1c/0x2c
>
> Fix this by keeping track of the operations we made, and undo the
> offload only if we actually did it.
>
> I've added "bool offloaded" inside a 4 byte hole between "int clockid"
> and "atomic64_t picos_per_byte". Now the first cache line looks like
> below:
>
> $ pahole -C taprio_sched net/sched/sch_taprio.o
> struct taprio_sched {
>         struct Qdisc * *           qdiscs;               /*     0     8 */
>         struct Qdisc *             root;                 /*     8     8 */
>         u32                        flags;                /*    16     4 */
>         enum tk_offsets            tk_offset;            /*    20     4 */
>         int                        clockid;              /*    24     4 */
>         bool                       offloaded;            /*    28     1 */
>
>         /* XXX 3 bytes hole, try to pack */
>
>         atomic64_t                 picos_per_byte;       /*    32     0 */
>
>         /* XXX 8 bytes hole, try to pack */
>
>         spinlock_t                 current_entry_lock;   /*    40     0 */
>
>         /* XXX 8 bytes hole, try to pack */
>
>         struct sched_entry *       current_entry;        /*    48     8 */
>         struct sched_gate_list *   oper_sched;           /*    56     8 */
>         /* --- cacheline 1 boundary (64 bytes) --- */
>
> Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
  2022-09-15 10:08 ` [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs Vladimir Oltean
@ 2022-09-15 21:54   ` Vinicius Costa Gomes
  2023-02-22 14:03   ` Kurt Kanzenbach
  1 sibling, 0 replies; 10+ messages in thread
From: Vinicius Costa Gomes @ 2022-09-15 21:54 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Voon Weifeng,
	Vladimir Oltean, Kurt Kanzenbach, linux-kernel

Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> taprio can only operate as root qdisc, and to that end, there exists the
> following check in taprio_init(), just as in mqprio:
>
> 	if (sch->parent != TC_H_ROOT)
> 		return -EOPNOTSUPP;
>
> And indeed, when we try to attach taprio to an mqprio child, it fails as
> expected:
>
> $ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
> 	map 0 1 2 3 4 5 6 7 \
> 	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
> $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
> 	map 0 1 2 3 4 5 6 7 \
> 	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> 	flags 0x0 clockid CLOCK_TAI
> Error: sch_taprio: Can only be attached as root qdisc.
>
> (extack message added by me)
>
> But when we try to attach a taprio child to a taprio root qdisc,
> surprisingly it doesn't fail:
>
> $ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
> 	map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> 	flags 0x0 clockid CLOCK_TAI
> $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
> 	map 0 1 2 3 4 5 6 7 \
> 	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> 	flags 0x0 clockid CLOCK_TAI
>
> This is because tc_modify_qdisc() behaves differently when mqprio is
> root, vs when taprio is root.
>
> In the mqprio case, it finds the parent qdisc through
> p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
> q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
> ignored according to the comment right below ("It may be default qdisc,
> ignore it"). As a result, tc_modify_qdisc() goes through the
> qdisc_create() code path, and this gives taprio_init() a chance to check
> for sch_parent != TC_H_ROOT and error out.
>
> Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
> different. It is not the default qdisc created for each netdev queue
> (both taprio and mqprio call qdisc_create_dflt() and keep them in
> a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
> makes qdisc_leaf() return the _root_ qdisc, aka itself.
>
> When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
> code path, because the qdisc layer never finds out about the child qdisc
> of the root. And through the ->change() ops, taprio has no reason to
> check whether its parent is root or not, just through ->init(), which is
> not called.
>
> The problem is the taprio_leaf() implementation. Even though code wise,
> it does the exact same thing as mqprio_leaf() which it is copied from,
> it works with different input data. This is because mqprio does not
> attach itself (the root) to each device TX queue, but one of the default
> qdiscs from its private array.
>
> In fact, since commit 13511704f8d7 ("net: taprio offload: enforce qdisc
> to netdev queue mapping"), taprio does this too, but just for the full
> offload case. So if we tried to attach a taprio child to a fully
> offloaded taprio root qdisc, it would properly fail too; just not to a
> software root taprio.
>
> To fix the problem, stop looking at the Qdisc that's attached to the TX
> queue, and instead, always return the default qdiscs that we've
> allocated (and to which we privately enqueue and dequeue, in software
> scheduling mode).
>
> Since Qdisc_class_ops :: leaf  is only called from tc_modify_qdisc(),
> the risk of unforeseen side effects introduced by this change is
> minimal.
>
> Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---

I spent a long time trying to think of cases that this would break,
could not think of anything:


Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 0/2] Fixes for tc-taprio software mode
  2022-09-15 10:08 [PATCH v2 net 0/2] Fixes for tc-taprio software mode Vladimir Oltean
  2022-09-15 10:08 ` [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled Vladimir Oltean
  2022-09-15 10:08 ` [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs Vladimir Oltean
@ 2022-09-20 19:00 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-20 19:00 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, vinicius.gomes, jhs, xiyou.wangcong, jiri, davem,
	edumazet, kuba, pabeni, weifeng.voon, olteanv, kurt,
	linux-kernel

Hello:

This series was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 15 Sep 2022 13:08:00 +0300 you wrote:
> While working on some new features for tc-taprio, I found some strange
> behavior which looked like bugs. I was able to eventually trigger a NULL
> pointer dereference. This patch set fixes 2 issues I saw. Detailed
> explanation in patches.
> 
> Changes in v2: dropped patch 3/3 (will resend to net-next).
> 
> [...]

Here is the summary with links:
  - [v2,net,1/2] net/sched: taprio: avoid disabling offload when it was never enabled
    https://git.kernel.org/netdev/net/c/db46e3a88a09
  - [v2,net,2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
    https://git.kernel.org/netdev/net/c/1461d212ab27

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
  2022-09-15 10:08 ` [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs Vladimir Oltean
  2022-09-15 21:54   ` Vinicius Costa Gomes
@ 2023-02-22 14:03   ` Kurt Kanzenbach
  2023-02-22 14:25     ` Vladimir Oltean
  1 sibling, 1 reply; 10+ messages in thread
From: Kurt Kanzenbach @ 2023-02-22 14:03 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Voon Weifeng, Vladimir Oltean, linux-kernel,
	Sebastian Andrzej Siewior

[-- Attachment #1: Type: text/plain, Size: 4262 bytes --]

Hi Vladimir,

On Thu Sep 15 2022, Vladimir Oltean wrote:
> taprio can only operate as root qdisc, and to that end, there exists the
> following check in taprio_init(), just as in mqprio:
>
> 	if (sch->parent != TC_H_ROOT)
> 		return -EOPNOTSUPP;
>
> And indeed, when we try to attach taprio to an mqprio child, it fails as
> expected:
>
> $ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
> 	map 0 1 2 3 4 5 6 7 \
> 	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
> $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
> 	map 0 1 2 3 4 5 6 7 \
> 	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> 	flags 0x0 clockid CLOCK_TAI
> Error: sch_taprio: Can only be attached as root qdisc.
>
> (extack message added by me)
>
> But when we try to attach a taprio child to a taprio root qdisc,
> surprisingly it doesn't fail:
>
> $ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
> 	map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> 	flags 0x0 clockid CLOCK_TAI
> $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
> 	map 0 1 2 3 4 5 6 7 \
> 	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> 	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
> 	flags 0x0 clockid CLOCK_TAI
>
> This is because tc_modify_qdisc() behaves differently when mqprio is
> root, vs when taprio is root.
>
> In the mqprio case, it finds the parent qdisc through
> p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
> q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
> ignored according to the comment right below ("It may be default qdisc,
> ignore it"). As a result, tc_modify_qdisc() goes through the
> qdisc_create() code path, and this gives taprio_init() a chance to check
> for sch_parent != TC_H_ROOT and error out.
>
> Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
> different. It is not the default qdisc created for each netdev queue
> (both taprio and mqprio call qdisc_create_dflt() and keep them in
> a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
> makes qdisc_leaf() return the _root_ qdisc, aka itself.
>
> When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
> code path, because the qdisc layer never finds out about the child qdisc
> of the root. And through the ->change() ops, taprio has no reason to
> check whether its parent is root or not, just through ->init(), which is
> not called.
>
> The problem is the taprio_leaf() implementation. Even though code wise,
> it does the exact same thing as mqprio_leaf() which it is copied from,
> it works with different input data. This is because mqprio does not
> attach itself (the root) to each device TX queue, but one of the default
> qdiscs from its private array.
>
> In fact, since commit 13511704f8d7 ("net: taprio offload: enforce qdisc
> to netdev queue mapping"), taprio does this too, but just for the full
> offload case. So if we tried to attach a taprio child to a fully
> offloaded taprio root qdisc, it would properly fail too; just not to a
> software root taprio.
>
> To fix the problem, stop looking at the Qdisc that's attached to the TX
> queue, and instead, always return the default qdiscs that we've
> allocated (and to which we privately enqueue and dequeue, in software
> scheduling mode).
>
> Since Qdisc_class_ops :: leaf  is only called from tc_modify_qdisc(),
> the risk of unforeseen side effects introduced by this change is
> minimal.
>
> Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

This commit was backported to v5.15-LTS which results in NULL pointer
dereferences e.g., when attaching an ETF child qdisc to taprio.

From what I can see is, that the issue was reported back then and this
commit was reverted [1]. However, the revert didn't make it into
v5.15-LTS? Is there a reason for it? I'm testing 5.15.94-rt59 here.

Thanks,
Kurt

[1] - https://lore.kernel.org/all/20221004220100.1650558-1-vladimir.oltean@nxp.com/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
  2023-02-22 14:03   ` Kurt Kanzenbach
@ 2023-02-22 14:25     ` Vladimir Oltean
  2023-02-23 13:03       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir Oltean @ 2023-02-22 14:25 UTC (permalink / raw)
  To: Kurt Kanzenbach, Greg Kroah-Hartman, Sasha Levin
  Cc: Vladimir Oltean, netdev, Vinicius Costa Gomes, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Voon Weifeng, linux-kernel,
	Sebastian Andrzej Siewior

+Greg, Sasha.

Hi Kurt,

On Wed, Feb 22, 2023 at 03:03:04PM +0100, Kurt Kanzenbach wrote:
> This commit was backported to v5.15-LTS which results in NULL pointer
> dereferences e.g., when attaching an ETF child qdisc to taprio.
> 
> From what I can see is, that the issue was reported back then and this
> commit was reverted [1]. However, the revert didn't make it into
> v5.15-LTS? Is there a reason for it? I'm testing 5.15.94-rt59 here.
> 
> Thanks,
> Kurt
> 
> [1] - https://lore.kernel.org/all/20221004220100.1650558-1-vladimir.oltean@nxp.com/

You are right; the patchwork-bot clearly says that the revert was
applied to net.git as commit af7b29b1deaa ("Revert "net/sched: taprio:
make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs""), but
the revert never made it to stable.

OTOH, the original patch did make it to, and still is in, linux-stable.
I have backport notification emails of the original to 5.4, 5.10, 5.15, 5.19.

Greg, Sasha, can you please pick up and backport commit af7b29b1deaa
("Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue
pfifo child qdiscs"") to the currently maintained stable kernels?

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
  2023-02-22 14:25     ` Vladimir Oltean
@ 2023-02-23 13:03       ` Greg Kroah-Hartman
  2023-02-23 13:18         ` Vladimir Oltean
  0 siblings, 1 reply; 10+ messages in thread
From: Greg Kroah-Hartman @ 2023-02-23 13:03 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Kurt Kanzenbach, Sasha Levin, Vladimir Oltean, netdev,
	Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Voon Weifeng, linux-kernel, Sebastian Andrzej Siewior

On Wed, Feb 22, 2023 at 04:25:07PM +0200, Vladimir Oltean wrote:
> +Greg, Sasha.

Hint, in the future, please cc: stable@vger.kernel.org with stable
kernel requests, I know I don't see stuff that isn't cc:ed there
normally when working on that tree.

> 
> Hi Kurt,
> 
> On Wed, Feb 22, 2023 at 03:03:04PM +0100, Kurt Kanzenbach wrote:
> > This commit was backported to v5.15-LTS which results in NULL pointer
> > dereferences e.g., when attaching an ETF child qdisc to taprio.
> > 
> > From what I can see is, that the issue was reported back then and this
> > commit was reverted [1]. However, the revert didn't make it into
> > v5.15-LTS? Is there a reason for it? I'm testing 5.15.94-rt59 here.
> > 
> > Thanks,
> > Kurt
> > 
> > [1] - https://lore.kernel.org/all/20221004220100.1650558-1-vladimir.oltean@nxp.com/
> 
> You are right; the patchwork-bot clearly says that the revert was
> applied to net.git as commit af7b29b1deaa ("Revert "net/sched: taprio:
> make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs""), but
> the revert never made it to stable.
> 
> OTOH, the original patch did make it to, and still is in, linux-stable.
> I have backport notification emails of the original to 5.4, 5.10, 5.15, 5.19.
> 
> Greg, Sasha, can you please pick up and backport commit af7b29b1deaa
> ("Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue
> pfifo child qdiscs"") to the currently maintained stable kernels?

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
  2023-02-23 13:03       ` Greg Kroah-Hartman
@ 2023-02-23 13:18         ` Vladimir Oltean
  0 siblings, 0 replies; 10+ messages in thread
From: Vladimir Oltean @ 2023-02-23 13:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Vladimir Oltean, Kurt Kanzenbach, Sasha Levin, netdev,
	Vinicius Costa Gomes, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Voon Weifeng, linux-kernel, Sebastian Andrzej Siewior

On Thu, Feb 23, 2023 at 02:03:17PM +0100, Greg Kroah-Hartman wrote:
> On Wed, Feb 22, 2023 at 04:25:07PM +0200, Vladimir Oltean wrote:
> > +Greg, Sasha.
> 
> Hint, in the future, please cc: stable@vger.kernel.org with stable
> kernel requests, I know I don't see stuff that isn't cc:ed there
> normally when working on that tree.

Ok, sorry, I got ahead of myself.

> > Greg, Sasha, can you please pick up and backport commit af7b29b1deaa
> > ("Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue
> > pfifo child qdiscs"") to the currently maintained stable kernels?
> 
> Now queued up, thanks.

Thanks! I got email notifications for 5.4, 5.10 and 5.15.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-02-23 13:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15 10:08 [PATCH v2 net 0/2] Fixes for tc-taprio software mode Vladimir Oltean
2022-09-15 10:08 ` [PATCH v2 net 1/2] net/sched: taprio: avoid disabling offload when it was never enabled Vladimir Oltean
2022-09-15 21:52   ` Vinicius Costa Gomes
2022-09-15 10:08 ` [PATCH v2 net 2/2] net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs Vladimir Oltean
2022-09-15 21:54   ` Vinicius Costa Gomes
2023-02-22 14:03   ` Kurt Kanzenbach
2023-02-22 14:25     ` Vladimir Oltean
2023-02-23 13:03       ` Greg Kroah-Hartman
2023-02-23 13:18         ` Vladimir Oltean
2022-09-20 19:00 ` [PATCH v2 net 0/2] Fixes for tc-taprio software mode patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).