linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] taprio: fix panic while hw offload sched list swap
@ 2019-11-01 23:28 Ivan Khoronzhuk
  2019-11-04 21:20 ` Vinicius Costa Gomes
  2019-11-05 21:59 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Ivan Khoronzhuk @ 2019-11-01 23:28 UTC (permalink / raw)
  To: davem, vinicius.gomes
  Cc: jhs, xiyou.wangcong, jiri, netdev, linux-kernel, Ivan Khoronzhuk

Don't swap oper and admin schedules too early, it's not correct and
causes crash.

Steps to reproduce:

1)
tc qdisc replace dev eth0 parent root handle 100 taprio \
    num_tc 3 \
    map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
    queues 1@0 1@1 1@2 \
    base-time $SOME_BASE_TIME \
    sched-entry S 01 80000 \
    sched-entry S 02 15000 \
    sched-entry S 04 40000 \
    flags 2

2)
tc qdisc replace dev eth0 parent root handle 100 taprio \
    base-time $SOME_BASE_TIME \
    sched-entry S 01 90000 \
    sched-entry S 02 20000 \
    sched-entry S 04 40000 \
    flags 2

3)
tc qdisc replace dev eth0 parent root handle 100 taprio \
    base-time $SOME_BASE_TIME \
    sched-entry S 01 150000 \
    sched-entry S 02 200000 \
    sched-entry S 04 40000 \
    flags 2

Do 2 3 2 .. steps  more times if not happens and observe:

[  305.832319] Unable to handle kernel write to read-only memory at
virtual address ffff0000087ce7f0
[  305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
[  305.919306] Hardware name: Texas Instruments AM654 Base Board (DT)

[...]

[  306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80
[  306.022422] Call trace:
[  306.024866]  taprio_free_sched_cb+0x4c/0x98
[  306.029040]  rcu_process_callbacks+0x25c/0x410
[  306.033476]  __do_softirq+0x10c/0x208
[  306.037132]  irq_exit+0xb8/0xc8
[  306.040267]  __handle_domain_irq+0x64/0xb8
[  306.044352]  gic_handle_irq+0x7c/0x178
[  306.048092]  el1_irq+0xb0/0x128
[  306.051227]  arch_cpu_idle+0x10/0x18
[  306.054795]  do_idle+0x120/0x138
[  306.058015]  cpu_startup_entry+0x20/0x28
[  306.061931]  rest_init+0xcc/0xd8
[  306.065154]  start_kernel+0x3bc/0x3e4
[  306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662)
[  306.074900] ---[ end trace 96c8e2284a9d9d6e ]---
[  306.079507] Kernel panic - not syncing: Fatal exception in interrupt
[  306.085847] SMP: stopping secondary CPUs
[  306.089765] Kernel Offset: disabled

Try to explain one of the possible crash cases:

The "real" admin list is assigned when admin_sched is set to
new_admin, it happens after "swap", that assigns to oper_sched NULL.
Thus if call qdisc show it can crash.

Farther, next second time, when sched list is updated, the admin_sched
is not NULL and becomes the oper_sched, previous oper_sched was NULL so
just skipped. But then admin_sched is assigned new_admin, but schedules
to free previous assigned admin_sched (that already became oper_sched).

Farther, next third time, when sched list is updated,
while one more swap, oper_sched is not null, but it was happy to be
freed already (while prev. admin update), so while try to free
oper_sched the kernel panic happens at taprio_free_sched_cb().

So, move the "swap emulation" where it should be according to function
comment from code.

Fixes: 9c66d15646760e ("taprio: Add support for hardware offloading")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---

Based on net/master

 net/sched/sch_taprio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 6719a65169d4..286ad4bb50f1 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1224,8 +1224,6 @@ static int taprio_enable_offload(struct net_device *dev,
 		goto done;
 	}
 
-	taprio_offload_config_changed(q);
-
 done:
 	taprio_offload_free(offload);
 
@@ -1505,6 +1503,9 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 			call_rcu(&admin->rcu, taprio_free_sched_cb);
 
 		spin_unlock_irqrestore(&q->current_entry_lock, flags);
+
+		if (FULL_OFFLOAD_IS_ENABLED(taprio_flags))
+			taprio_offload_config_changed(q);
 	}
 
 	new_admin = NULL;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] taprio: fix panic while hw offload sched list swap
  2019-11-01 23:28 [PATCH net] taprio: fix panic while hw offload sched list swap Ivan Khoronzhuk
@ 2019-11-04 21:20 ` Vinicius Costa Gomes
  2019-11-04 22:56   ` Vladimir Oltean
  2019-11-05 21:59 ` David Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Vinicius Costa Gomes @ 2019-11-04 21:20 UTC (permalink / raw)
  To: Ivan Khoronzhuk, davem
  Cc: jhs, xiyou.wangcong, jiri, netdev, linux-kernel, Ivan Khoronzhuk,
	Vladimir Oltean

Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> writes:

> Don't swap oper and admin schedules too early, it's not correct and
> causes crash.
>
> Steps to reproduce:
>
> 1)
> tc qdisc replace dev eth0 parent root handle 100 taprio \
>     num_tc 3 \
>     map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>     queues 1@0 1@1 1@2 \
>     base-time $SOME_BASE_TIME \
>     sched-entry S 01 80000 \
>     sched-entry S 02 15000 \
>     sched-entry S 04 40000 \
>     flags 2
>
> 2)
> tc qdisc replace dev eth0 parent root handle 100 taprio \
>     base-time $SOME_BASE_TIME \
>     sched-entry S 01 90000 \
>     sched-entry S 02 20000 \
>     sched-entry S 04 40000 \
>     flags 2
>
> 3)
> tc qdisc replace dev eth0 parent root handle 100 taprio \
>     base-time $SOME_BASE_TIME \
>     sched-entry S 01 150000 \
>     sched-entry S 02 200000 \
>     sched-entry S 04 40000 \
>     flags 2
>
> Do 2 3 2 .. steps  more times if not happens and observe:
>
> [  305.832319] Unable to handle kernel write to read-only memory at
> virtual address ffff0000087ce7f0
> [  305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> [  305.919306] Hardware name: Texas Instruments AM654 Base Board (DT)
>
> [...]
>
> [  306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80
> [  306.022422] Call trace:
> [  306.024866]  taprio_free_sched_cb+0x4c/0x98
> [  306.029040]  rcu_process_callbacks+0x25c/0x410
> [  306.033476]  __do_softirq+0x10c/0x208
> [  306.037132]  irq_exit+0xb8/0xc8
> [  306.040267]  __handle_domain_irq+0x64/0xb8
> [  306.044352]  gic_handle_irq+0x7c/0x178
> [  306.048092]  el1_irq+0xb0/0x128
> [  306.051227]  arch_cpu_idle+0x10/0x18
> [  306.054795]  do_idle+0x120/0x138
> [  306.058015]  cpu_startup_entry+0x20/0x28
> [  306.061931]  rest_init+0xcc/0xd8
> [  306.065154]  start_kernel+0x3bc/0x3e4
> [  306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662)
> [  306.074900] ---[ end trace 96c8e2284a9d9d6e ]---
> [  306.079507] Kernel panic - not syncing: Fatal exception in interrupt
> [  306.085847] SMP: stopping secondary CPUs
> [  306.089765] Kernel Offset: disabled
>
> Try to explain one of the possible crash cases:
>
> The "real" admin list is assigned when admin_sched is set to
> new_admin, it happens after "swap", that assigns to oper_sched NULL.
> Thus if call qdisc show it can crash.
>
> Farther, next second time, when sched list is updated, the admin_sched
> is not NULL and becomes the oper_sched, previous oper_sched was NULL so
> just skipped. But then admin_sched is assigned new_admin, but schedules
> to free previous assigned admin_sched (that already became oper_sched).
>
> Farther, next third time, when sched list is updated,
> while one more swap, oper_sched is not null, but it was happy to be
> freed already (while prev. admin update), so while try to free
> oper_sched the kernel panic happens at taprio_free_sched_cb().
>
> So, move the "swap emulation" where it should be according to function
> comment from code.
>
> Fixes: 9c66d15646760e ("taprio: Add support for hardware offloading")
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> ---

As it solves a crash, and I have no problems with this fix:

Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

But reading the code, I got the feeling that upstream "swap emulation"
part of the code is not working as it should, perhaps it was lost during
upstreaming of the patch? Vladimir, can you confirm that this works for
you? (yeah, this can be solved later)


Cheers,
--
Vinicius


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] taprio: fix panic while hw offload sched list swap
  2019-11-04 21:20 ` Vinicius Costa Gomes
@ 2019-11-04 22:56   ` Vladimir Oltean
  0 siblings, 0 replies; 4+ messages in thread
From: Vladimir Oltean @ 2019-11-04 22:56 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Ivan Khoronzhuk, David S. Miller, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, netdev, lkml

On Mon, 4 Nov 2019 at 23:19, Vinicius Costa Gomes
<vinicius.gomes@intel.com> wrote:
>
> Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> writes:
>
> > Don't swap oper and admin schedules too early, it's not correct and
> > causes crash.
> >
> > Steps to reproduce:
> >
> > 1)
> > tc qdisc replace dev eth0 parent root handle 100 taprio \
> >     num_tc 3 \
> >     map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
> >     queues 1@0 1@1 1@2 \
> >     base-time $SOME_BASE_TIME \
> >     sched-entry S 01 80000 \
> >     sched-entry S 02 15000 \
> >     sched-entry S 04 40000 \
> >     flags 2
> >
> > 2)
> > tc qdisc replace dev eth0 parent root handle 100 taprio \
> >     base-time $SOME_BASE_TIME \
> >     sched-entry S 01 90000 \
> >     sched-entry S 02 20000 \
> >     sched-entry S 04 40000 \
> >     flags 2
> >
> > 3)
> > tc qdisc replace dev eth0 parent root handle 100 taprio \
> >     base-time $SOME_BASE_TIME \
> >     sched-entry S 01 150000 \
> >     sched-entry S 02 200000 \
> >     sched-entry S 04 40000 \
> >     flags 2
> >
> > Do 2 3 2 .. steps  more times if not happens and observe:
> >
> > [  305.832319] Unable to handle kernel write to read-only memory at
> > virtual address ffff0000087ce7f0
> > [  305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> > [  305.919306] Hardware name: Texas Instruments AM654 Base Board (DT)
> >
> > [...]
> >
> > [  306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80
> > [  306.022422] Call trace:
> > [  306.024866]  taprio_free_sched_cb+0x4c/0x98
> > [  306.029040]  rcu_process_callbacks+0x25c/0x410
> > [  306.033476]  __do_softirq+0x10c/0x208
> > [  306.037132]  irq_exit+0xb8/0xc8
> > [  306.040267]  __handle_domain_irq+0x64/0xb8
> > [  306.044352]  gic_handle_irq+0x7c/0x178
> > [  306.048092]  el1_irq+0xb0/0x128
> > [  306.051227]  arch_cpu_idle+0x10/0x18
> > [  306.054795]  do_idle+0x120/0x138
> > [  306.058015]  cpu_startup_entry+0x20/0x28
> > [  306.061931]  rest_init+0xcc/0xd8
> > [  306.065154]  start_kernel+0x3bc/0x3e4
> > [  306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662)
> > [  306.074900] ---[ end trace 96c8e2284a9d9d6e ]---
> > [  306.079507] Kernel panic - not syncing: Fatal exception in interrupt
> > [  306.085847] SMP: stopping secondary CPUs
> > [  306.089765] Kernel Offset: disabled
> >
> > Try to explain one of the possible crash cases:
> >
> > The "real" admin list is assigned when admin_sched is set to
> > new_admin, it happens after "swap", that assigns to oper_sched NULL.
> > Thus if call qdisc show it can crash.
> >
> > Farther, next second time, when sched list is updated, the admin_sched
> > is not NULL and becomes the oper_sched, previous oper_sched was NULL so
> > just skipped. But then admin_sched is assigned new_admin, but schedules
> > to free previous assigned admin_sched (that already became oper_sched).
> >
> > Farther, next third time, when sched list is updated,
> > while one more swap, oper_sched is not null, but it was happy to be
> > freed already (while prev. admin update), so while try to free
> > oper_sched the kernel panic happens at taprio_free_sched_cb().
> >
> > So, move the "swap emulation" where it should be according to function
> > comment from code.
> >
> > Fixes: 9c66d15646760e ("taprio: Add support for hardware offloading")
> > Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
> > ---
>
> As it solves a crash, and I have no problems with this fix:
>
> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>
> But reading the code, I got the feeling that upstream "swap emulation"
> part of the code is not working as it should, perhaps it was lost during
> upstreaming of the patch? Vladimir, can you confirm that this works for
> you? (yeah, this can be solved later)
>
>
> Cheers,
> --
> Vinicius
>

Hi Ivan, Vinicius,

I haven't actually used the "replace" functionality. I found it
strange that subsequent "replace" commands can't have the same syntax
as the first one, so it's kind of difficult to integrate them in a
script. And given the fact that I don't actually have a use case for
"replace", I left this where it is (apparently causing this race when
offloading a config and q->oper_sched is not NULL).

Actually there was yet another issue introduced by my patch, which I
discovered with lockdep after it got merged. I never managed to figure
out what it was saying, but it looks like your fix addresses that as
well, for some reason:

[94787.464015] =====================================================
[94787.470076] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
[94787.476654] 5.4.0-rc1-00457-g19cf0a771e53 #300 Not tainted
[94787.482106] -----------------------------------------------------
[94787.488165] tc/2575 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
[94787.494224] da592db0 (&(&q->current_entry_lock)->rlock){+.+.}, at:
taprio_change+0xddc/0xf04
[94787.502633]
[94787.502633] and this task is already holding:
[94787.508431] da592c88 (&(&sch->q.lock)->rlock){+.-.}, at:
taprio_change+0xa94/0xf04
[94787.515965] which would create a new lock dependency:
[94787.520985]  (&(&sch->q.lock)->rlock){+.-.} ->
(&(&q->current_entry_lock)->rlock){+.+.}
[94787.528952]
[94787.528952] but this new dependency connects a SOFTIRQ-irq-safe lock:
[94787.536823]  (&(&sch->q.lock)->rlock){+.-.}
[94787.536828]
[94787.536828] ... which became SOFTIRQ-irq-safe at:
[94787.547129]   _raw_spin_lock+0x44/0x54
[94787.550858]   sch_direct_xmit+0x274/0x2ac
[94787.554847]   __dev_queue_xmit+0xc7c/0xfcc
[94787.558920]   br_dev_queue_push_xmit+0x84/0x1ac
[94787.563426]   br_forward_finish+0x21c/0x230
[94787.567583]   __br_forward+0x2fc/0x30c
[94787.571312]   br_dev_xmit+0x358/0x648
[94787.574953]   dev_hard_start_xmit+0xf4/0x3b0
[94787.579198]   __dev_queue_xmit+0xe38/0xfcc
[94787.583274]   arp_solicit+0x1a0/0x504
[94787.586916]   neigh_probe+0x54/0x7c
[94787.590385]   neigh_timer_handler+0x9c/0x3a4
[94787.594631]   call_timer_fn+0xd4/0x3bc
[94787.598357]   run_timer_softirq+0x24c/0x6a8
[94787.602517]   __do_softirq+0x114/0x54c
[94787.606244]   irq_exit+0x168/0x178
[94787.609627]   __handle_domain_irq+0x60/0xb4
[94787.613788]   gic_handle_irq+0x58/0x9c
[94787.617515]   __irq_svc+0x70/0x98
[94787.620811]   arch_cpu_idle+0x24/0x3c
[94787.624451]   arch_cpu_idle+0x24/0x3c
[94787.628093]   do_idle+0x1c4/0x2b0
[94787.631388]   cpu_startup_entry+0x18/0x1c
[94787.635373]   0x8030278c
[94787.637888]
[94787.637888] to a SOFTIRQ-irq-unsafe lock:
[94787.643339]  (&(&q->current_entry_lock)->rlock){+.+.}
[94787.643345]
[94787.643345] ... which became SOFTIRQ-irq-unsafe at:
[94787.654675] ...
[94787.654680]   _raw_spin_lock+0x44/0x54
[94787.660140]   taprio_offload_config_changed+0x14/0x100
[94787.665248]   taprio_change+0xa78/0xf04
[94787.669063]   qdisc_create+0x1f0/0x53c
[94787.672790]   tc_modify_qdisc+0x1b0/0x834
[94787.676777]   rtnetlink_rcv_msg+0x1a4/0x510
[94787.680937]   netlink_rcv_skb+0xe0/0x114
[94787.684837]   netlink_unicast+0x17c/0x1f8
[94787.688822]   netlink_sendmsg+0x2c4/0x370
[94787.692809]   ___sys_sendmsg+0x22c/0x24c
[94787.696709]   __sys_sendmsg+0x50/0x8c
[94787.700349]   ret_fast_syscall+0x0/0x28
[94787.704161]   0xbeb4e208
[94787.706675]
[94787.706675] other info that might help us debug this:
[94787.706675]
[94787.714634]  Possible interrupt unsafe locking scenario:
[94787.714634]
[94787.721382]        CPU0                    CPU1
[94787.725884]        ----                    ----
[94787.730384]   lock(&(&q->current_entry_lock)->rlock);
[94787.735406]                                local_irq_disable();
[94787.741290]                                lock(&(&sch->q.lock)->rlock);
[94787.747954]
lock(&(&q->current_entry_lock)->rlock);
[94787.755481]   <Interrupt>
[94787.758082]     lock(&(&sch->q.lock)->rlock);
[94787.762412]
[94787.762412]  *** DEADLOCK ***
[94787.762412]
[94787.768297] 2 locks held by tc/2575:
[94787.771848]  #0: c21b18ec (rtnl_mutex){+.+.}, at:
rtnetlink_rcv_msg+0x174/0x510
[94787.779127]  #1: da592c88 (&(&sch->q.lock)->rlock){+.-.}, at:
taprio_change+0xa94/0xf04
[94787.787095]
[94787.787095] the dependencies between SOFTIRQ-irq-safe lock and the
holding lock:
[94787.795939] -> (&(&sch->q.lock)->rlock){+.-.} {
[94787.800447]    HARDIRQ-ON-W at:
[94787.803572]                     _raw_spin_lock_bh+0x40/0x50
[94787.809114]                     dev_deactivate_queue.constprop.13+0x4c/0xd8
[94787.816038]                     dev_deactivate_many+0x5c/0x3d8
[94787.821838]                     dev_deactivate+0x3c/0x64
[94787.827121]                     linkwatch_do_dev+0x4c/0x80
[94787.832577]                     __linkwatch_run_queue+0x198/0x1e0
[94787.838637]                     linkwatch_event+0x2c/0x34
[94787.844008]                     process_one_work+0x300/0x7f4
[94787.849636]                     worker_thread+0x58/0x5a0
[94787.854919]                     kthread+0x12c/0x160
[94787.859769]                     ret_from_fork+0x14/0x20
[94787.864963]                     0x0
[94787.868427]    IN-SOFTIRQ-W at:
[94787.871550]                     _raw_spin_lock+0x44/0x54
[94787.876833]                     sch_direct_xmit+0x274/0x2ac
[94787.882377]                     __dev_queue_xmit+0xc7c/0xfcc
[94787.888005]                     br_dev_queue_push_xmit+0x84/0x1ac
[94787.894065]                     br_forward_finish+0x21c/0x230
[94787.899778]                     __br_forward+0x2fc/0x30c
[94787.905063]                     br_dev_xmit+0x358/0x648
[94787.910259]                     dev_hard_start_xmit+0xf4/0x3b0
[94787.916060]                     __dev_queue_xmit+0xe38/0xfcc
[94787.921689]                     arp_solicit+0x1a0/0x504
[94787.926887]                     neigh_probe+0x54/0x7c
[94787.931911]                     neigh_timer_handler+0x9c/0x3a4
[94787.937711]                     call_timer_fn+0xd4/0x3bc
[94787.942993]                     run_timer_softirq+0x24c/0x6a8
[94787.948708]                     __do_softirq+0x114/0x54c
[94787.953989]                     irq_exit+0x168/0x178
[94787.958927]                     __handle_domain_irq+0x60/0xb4
[94787.964642]                     gic_handle_irq+0x58/0x9c
[94787.969924]                     __irq_svc+0x70/0x98
[94787.974775]                     arch_cpu_idle+0x24/0x3c
[94787.979970]                     arch_cpu_idle+0x24/0x3c
[94787.985167]                     do_idle+0x1c4/0x2b0
[94787.990017]                     cpu_startup_entry+0x18/0x1c
[94787.995556]                     0x8030278c
[94787.999626]    INITIAL USE at:
[94788.002664]                    _raw_spin_lock_bh+0x40/0x50
[94788.008120]                    dev_deactivate_queue.constprop.13+0x4c/0xd8
[94788.014958]                    dev_deactivate_many+0x5c/0x3d8
[94788.020671]                    dev_deactivate+0x3c/0x64
[94788.025868]                    linkwatch_do_dev+0x4c/0x80
[94788.031237]                    __linkwatch_run_queue+0x198/0x1e0
[94788.037210]                    linkwatch_event+0x2c/0x34
[94788.042493]                    process_one_work+0x300/0x7f4
[94788.048035]                    worker_thread+0x58/0x5a0
[94788.053231]                    kthread+0x12c/0x160
[94788.057996]                    ret_from_fork+0x14/0x20
[94788.063104]                    0x0
[94788.066482]  }
[94788.068138]  ... key      at: [<c2862598>] __key.72250+0x0/0x8
[94788.073937]  ... acquired at:
[94788.076885]    taprio_change+0xddc/0xf04
[94788.080785]    qdisc_create+0x1f0/0x53c
[94788.084599]    tc_modify_qdisc+0x1b0/0x834
[94788.088671]    rtnetlink_rcv_msg+0x1a4/0x510
[94788.092917]    netlink_rcv_skb+0xe0/0x114
[94788.096904]    netlink_unicast+0x17c/0x1f8
[94788.100976]    netlink_sendmsg+0x2c4/0x370
[94788.105048]    ___sys_sendmsg+0x22c/0x24c
[94788.109034]    __sys_sendmsg+0x50/0x8c
[94788.112761]    ret_fast_syscall+0x0/0x28
[94788.116658]    0xbeb4e208
[94788.119259]
[94788.120736]
[94788.120736] the dependencies between the lock to be acquired
[94788.120739]  and SOFTIRQ-irq-unsafe lock:
[94788.131825] -> (&(&q->current_entry_lock)->rlock){+.+.} {
[94788.137196]    HARDIRQ-ON-W at:
[94788.140320]                     _raw_spin_lock+0x44/0x54
[94788.145602]                     taprio_offload_config_changed+0x14/0x100
[94788.152267]                     taprio_change+0xa78/0xf04
[94788.157636]                     qdisc_create+0x1f0/0x53c
[94788.162919]                     tc_modify_qdisc+0x1b0/0x834
[94788.168460]                     rtnetlink_rcv_msg+0x1a4/0x510
[94788.174175]                     netlink_rcv_skb+0xe0/0x114
[94788.179631]                     netlink_unicast+0x17c/0x1f8
[94788.185173]                     netlink_sendmsg+0x2c4/0x370
[94788.190714]                     ___sys_sendmsg+0x22c/0x24c
[94788.196170]                     __sys_sendmsg+0x50/0x8c
[94788.201365]                     ret_fast_syscall+0x0/0x28
[94788.206731]                     0xbeb4e208
[94788.210801]    SOFTIRQ-ON-W at:
[94788.213924]                     _raw_spin_lock+0x44/0x54
[94788.219206]                     taprio_offload_config_changed+0x14/0x100
[94788.225871]                     taprio_change+0xa78/0xf04
[94788.231239]                     qdisc_create+0x1f0/0x53c
[94788.236523]                     tc_modify_qdisc+0x1b0/0x834
[94788.242064]                     rtnetlink_rcv_msg+0x1a4/0x510
[94788.247779]                     netlink_rcv_skb+0xe0/0x114
[94788.253234]                     netlink_unicast+0x17c/0x1f8
[94788.258776]                     netlink_sendmsg+0x2c4/0x370
[94788.264317]                     ___sys_sendmsg+0x22c/0x24c
[94788.269773]                     __sys_sendmsg+0x50/0x8c
[94788.274968]                     ret_fast_syscall+0x0/0x28
[94788.280336]                     0xbeb4e208
[94788.284405]    INITIAL USE at:
[94788.287442]                    _raw_spin_lock+0x44/0x54
[94788.292638]                    taprio_offload_config_changed+0x14/0x100
[94788.299216]                    taprio_change+0xa78/0xf04
[94788.304497]                    qdisc_create+0x1f0/0x53c
[94788.309694]                    tc_modify_qdisc+0x1b0/0x834
[94788.315149]                    rtnetlink_rcv_msg+0x1a4/0x510
[94788.320777]                    netlink_rcv_skb+0xe0/0x114
[94788.326146]                    netlink_unicast+0x17c/0x1f8
[94788.331602]                    netlink_sendmsg+0x2c4/0x370
[94788.337057]                    ___sys_sendmsg+0x22c/0x24c
[94788.342426]                    __sys_sendmsg+0x50/0x8c
[94788.347535]                    ret_fast_syscall+0x0/0x28
[94788.352815]                    0xbeb4e208
[94788.356799]  }
[94788.358453]  ... key      at: [<c28626b4>] __key.70731+0x0/0x8
[94788.364251]  ... acquired at:
[94788.367199]    taprio_change+0xddc/0xf04
[94788.371099]    qdisc_create+0x1f0/0x53c
[94788.374913]    tc_modify_qdisc+0x1b0/0x834
[94788.378985]    rtnetlink_rcv_msg+0x1a4/0x510
[94788.383231]    netlink_rcv_skb+0xe0/0x114
[94788.387217]    netlink_unicast+0x17c/0x1f8
[94788.391290]    netlink_sendmsg+0x2c4/0x370
[94788.395362]    ___sys_sendmsg+0x22c/0x24c
[94788.399348]    __sys_sendmsg+0x50/0x8c
[94788.403074]    ret_fast_syscall+0x0/0x28
[94788.406972]    0xbeb4e208
[94788.409572]
[94788.411049]
[94788.411049] stack backtrace:
[94788.415384] CPU: 1 PID: 2575 Comm: tc Not tainted
5.4.0-rc1-00457-g19cf0a771e53 #300
[94788.423083] Hardware name: Freescale LS1021A
[94788.427338] [<c03142ec>] (unwind_backtrace) from [<c030e034>]
(show_stack+0x10/0x14)
[94788.435045] [<c030e034>] (show_stack) from [<c11b561c>]
(dump_stack+0xcc/0xf8)
[94788.442233] [<c11b561c>] (dump_stack) from [<c03b5fd0>]
(__lock_acquire+0x1d90/0x2398)
[94788.450111] [<c03b5fd0>] (__lock_acquire) from [<c03b6ec8>]
(lock_acquire+0xe0/0x22c)
[94788.457903] [<c03b6ec8>] (lock_acquire) from [<c11d9200>]
(_raw_spin_lock_irqsave+0x54/0x68)
[94788.466298] [<c11d9200>] (_raw_spin_lock_irqsave) from [<c0fdd894>]
(taprio_change+0xddc/0xf04)
[94788.474956] [<c0fdd894>] (taprio_change) from [<c0fc1698>]
(qdisc_create+0x1f0/0x53c)
[94788.482749] [<c0fc1698>] (qdisc_create) from [<c0fc1b94>]
(tc_modify_qdisc+0x1b0/0x834)
[94788.490714] [<c0fc1b94>] (tc_modify_qdisc) from [<c0f73c48>]
(rtnetlink_rcv_msg+0x1a4/0x510)
[94788.499112] [<c0f73c48>] (rtnetlink_rcv_msg) from [<c0ff13a8>]
(netlink_rcv_skb+0xe0/0x114)
[94788.507422] [<c0ff13a8>] (netlink_rcv_skb) from [<c0ff0bec>]
(netlink_unicast+0x17c/0x1f8)
[94788.515647] [<c0ff0bec>] (netlink_unicast) from [<c0ff0f2c>]
(netlink_sendmsg+0x2c4/0x370)
[94788.523872] [<c0ff0f2c>] (netlink_sendmsg) from [<c0f2da7c>]
(___sys_sendmsg+0x22c/0x24c)
[94788.532010] [<c0f2da7c>] (___sys_sendmsg) from [<c0f2ee90>]
(__sys_sendmsg+0x50/0x8c)
[94788.539802] [<c0f2ee90>] (__sys_sendmsg) from [<c0301000>]
(ret_fast_syscall+0x0/0x28)
[94788.547676] Exception stack(0xe9db3fa8 to 0xe9db3ff0)
[94788.552701] 3fa0:                   b6f4d8c8 00000114 00000003
beb4e260 00000000 00000000
[94788.560837] 3fc0: b6f4d8c8 00000114 004eb02c 00000128 5da59019
00000000 0000002d 00522c98
[94788.568971] 3fe0: 00000070 beb4e208 004fa950 b6dbee64

But anyway, yes it makes sense to call taprio_offload_config_changed
only after rcu_assign_pointer(q->admin_sched, new_admin), and that is
part of the reason why I found it difficult to add a mechanism for
drivers to notify back on offload activation time that doesn't trigger
any races in taprio.

Tested-by: Vladimir Oltean <olteanv@gmail.com>

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] taprio: fix panic while hw offload sched list swap
  2019-11-01 23:28 [PATCH net] taprio: fix panic while hw offload sched list swap Ivan Khoronzhuk
  2019-11-04 21:20 ` Vinicius Costa Gomes
@ 2019-11-05 21:59 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2019-11-05 21:59 UTC (permalink / raw)
  To: ivan.khoronzhuk
  Cc: vinicius.gomes, jhs, xiyou.wangcong, jiri, netdev, linux-kernel

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Sat,  2 Nov 2019 01:28:28 +0200

> Don't swap oper and admin schedules too early, it's not correct and
> causes crash.
 ...

Applied, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-05 21:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01 23:28 [PATCH net] taprio: fix panic while hw offload sched list swap Ivan Khoronzhuk
2019-11-04 21:20 ` Vinicius Costa Gomes
2019-11-04 22:56   ` Vladimir Oltean
2019-11-05 21:59 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).