* [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
@ 2021-11-08 14:53 Taehee Yoo
2021-11-10 3:20 ` patchwork-bot+netdevbpf
2021-11-11 15:37 ` Jakub Kicinski
0 siblings, 2 replies; 4+ messages in thread
From: Taehee Yoo @ 2021-11-08 14:53 UTC (permalink / raw)
To: davem, kuba, netdev; +Cc: ap420073
When the amt module is being removed, it calls flush_delayed_work() to exit
source_gc_wq. But it wouldn't be exited properly because the
amt_source_gc_work(), which is the callback function of source_gc_wq
internally calls mod_delayed_work() again.
So, amt_source_gc_work() would be called after the amt module is removed.
Therefore kernel panic would occur.
In order to avoid it, cancel_delayed_work() should be used instead of
flush_delayed_work().
Test commands:
modprobe amt
modprobe -rv amt
Splat looks like:
BUG: unable to handle page fault for address: fffffbfff80f50db
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1237ee067 P4D 1237ee067 PUD 1237b2067 PMD 100c11067 PTE 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0+ #27
5a0ebebc29fe5c40c68bea90197606c3a832b09f
RIP: 0010:run_timer_softirq+0x221/0xfc0
Code: 00 00 4c 89 e1 4c 8b 30 48 c1 e9 03 80 3c 29 00 0f 85 ed 0b 00 00
4d 89 34 24 4d 85 f6 74 19 49 8d 7e 08 48 89 f9 48 c1 e9 03 <80> 3c 29 00
0f 85 fa 0b 00 00 4d 89 66 08 83 04 24 01 49 89 d4 48
RSP: 0018:ffff888119009e50 EFLAGS: 00010806
RAX: ffff8881191f8a80 RBX: 00000000007ffe2a RCX: 1ffffffff80f50db
RDX: ffff888119009ed0 RSI: 0000000000000008 RDI: ffffffffc07a86d8
RBP: dffffc0000000000 R08: ffff8881191f8280 R09: ffffed102323f061
R10: ffff8881191f8307 R11: ffffed102323f060 R12: ffff888119009ec8
R13: 00000000000000c0 R14: ffffffffc07a86d0 R15: ffff8881191f82e8
FS: 0000000000000000(0000) GS:ffff888119000000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffbfff80f50db CR3: 00000001062dc002 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
? add_timer+0x650/0x650
? kvm_clock_read+0x14/0x30
? ktime_get+0xb9/0x180
? rcu_read_lock_held_common+0xe/0xa0
? rcu_read_lock_sched_held+0x56/0xc0
? rcu_read_lock_bh_held+0xa0/0xa0
? hrtimer_interrupt+0x271/0x790
__do_softirq+0x1d0/0x88f
irq_exit_rcu+0xe7/0x120
sysvec_apic_timer_interrupt+0x8a/0xb0
</IRQ>
<TASK>
[ ... ]
Fixes: bc54e49c140b ("amt: add multicast(IGMP) report message handler")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
drivers/net/amt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index c384b2694f9e..47a04c330885 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -3286,7 +3286,7 @@ static void __exit amt_fini(void)
{
rtnl_link_unregister(&amt_link_ops);
unregister_netdevice_notifier(&amt_notifier_block);
- flush_delayed_work(&source_gc_wq);
+ cancel_delayed_work(&source_gc_wq);
__amt_source_gc_work();
destroy_workqueue(amt_wq);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
2021-11-08 14:53 [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini() Taehee Yoo
@ 2021-11-10 3:20 ` patchwork-bot+netdevbpf
2021-11-11 15:37 ` Jakub Kicinski
1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-10 3:20 UTC (permalink / raw)
To: Taehee Yoo; +Cc: davem, kuba, netdev
Hello:
This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 8 Nov 2021 14:53:40 +0000 you wrote:
> When the amt module is being removed, it calls flush_delayed_work() to exit
> source_gc_wq. But it wouldn't be exited properly because the
> amt_source_gc_work(), which is the callback function of source_gc_wq
> internally calls mod_delayed_work() again.
> So, amt_source_gc_work() would be called after the amt module is removed.
> Therefore kernel panic would occur.
> In order to avoid it, cancel_delayed_work() should be used instead of
> flush_delayed_work().
>
> [...]
Here is the summary with links:
- [net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
https://git.kernel.org/netdev/net/c/43aa4937994f
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
2021-11-08 14:53 [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini() Taehee Yoo
2021-11-10 3:20 ` patchwork-bot+netdevbpf
@ 2021-11-11 15:37 ` Jakub Kicinski
2021-11-12 12:17 ` Taehee Yoo
1 sibling, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2021-11-11 15:37 UTC (permalink / raw)
To: Taehee Yoo; +Cc: davem, netdev
On Mon, 8 Nov 2021 14:53:40 +0000 Taehee Yoo wrote:
> When the amt module is being removed, it calls flush_delayed_work() to exit
> source_gc_wq. But it wouldn't be exited properly because the
> amt_source_gc_work(), which is the callback function of source_gc_wq
> internally calls mod_delayed_work() again.
> So, amt_source_gc_work() would be called after the amt module is removed.
> Therefore kernel panic would occur.
> In order to avoid it, cancel_delayed_work() should be used instead of
> flush_delayed_work().
Somehow I convinced myself this is correct but now I'm not sure, again.
> diff --git a/drivers/net/amt.c b/drivers/net/amt.c
> index c384b2694f9e..47a04c330885 100644
> --- a/drivers/net/amt.c
> +++ b/drivers/net/amt.c
> @@ -3286,7 +3286,7 @@ static void __exit amt_fini(void)
> {
> rtnl_link_unregister(&amt_link_ops);
> unregister_netdevice_notifier(&amt_notifier_block);
> - flush_delayed_work(&source_gc_wq);
> + cancel_delayed_work(&source_gc_wq);
This doesn't guarantee that the work is not running _right_ now and
will re-arm itself on the next clock cycle, so to speak.
CPU 0 CPU 1
----- -----
worker gets the work
clears pending
work starts running
cancel_work
grabs pending
clears pending
mod_delayed_work()
You need cancel_delayed_work_sync(), right?
> __amt_source_gc_work();
> destroy_workqueue(amt_wq);
> }
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
2021-11-11 15:37 ` Jakub Kicinski
@ 2021-11-12 12:17 ` Taehee Yoo
0 siblings, 0 replies; 4+ messages in thread
From: Taehee Yoo @ 2021-11-12 12:17 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, netdev
Hi Jakub,
Thank you for your review!
On 11/12/21 12:37 AM, Jakub Kicinski wrote:
> On Mon, 8 Nov 2021 14:53:40 +0000 Taehee Yoo wrote:
>> When the amt module is being removed, it calls flush_delayed_work()
to exit
>> source_gc_wq. But it wouldn't be exited properly because the
>> amt_source_gc_work(), which is the callback function of source_gc_wq
>> internally calls mod_delayed_work() again.
>> So, amt_source_gc_work() would be called after the amt module is
removed.
>> Therefore kernel panic would occur.
>> In order to avoid it, cancel_delayed_work() should be used instead of
>> flush_delayed_work().
>
> Somehow I convinced myself this is correct but now I'm not sure, again.
>
>> diff --git a/drivers/net/amt.c b/drivers/net/amt.c
>> index c384b2694f9e..47a04c330885 100644
>> --- a/drivers/net/amt.c
>> +++ b/drivers/net/amt.c
>> @@ -3286,7 +3286,7 @@ static void __exit amt_fini(void)
>> {
>> rtnl_link_unregister(&amt_link_ops);
>> unregister_netdevice_notifier(&amt_notifier_block);
>> - flush_delayed_work(&source_gc_wq);
>> + cancel_delayed_work(&source_gc_wq);
>
> This doesn't guarantee that the work is not running _right_ now and
> will re-arm itself on the next clock cycle, so to speak.
>
> CPU 0 CPU 1
> ----- -----
>
> worker gets the work
> clears pending
> work starts running
> cancel_work
> grabs pending
> clears pending
> mod_delayed_work()
>
> You need cancel_delayed_work_sync(), right?
>
you're right!
I think cancel_delayed_work() is async so that it can't clearly fix this
problem.
So, I will send a new patch after some tests.
Thank you so much for catching it!
Thanks a lot,
Taehee
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-11-12 12:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-08 14:53 [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini() Taehee Yoo
2021-11-10 3:20 ` patchwork-bot+netdevbpf
2021-11-11 15:37 ` Jakub Kicinski
2021-11-12 12:17 ` Taehee Yoo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.