All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
@ 2021-11-08 14:53 Taehee Yoo
  2021-11-10  3:20 ` patchwork-bot+netdevbpf
  2021-11-11 15:37 ` Jakub Kicinski
  0 siblings, 2 replies; 4+ messages in thread
From: Taehee Yoo @ 2021-11-08 14:53 UTC (permalink / raw)
  To: davem, kuba, netdev; +Cc: ap420073

When the amt module is being removed, it calls flush_delayed_work() to exit
source_gc_wq. But it wouldn't be exited properly because the
amt_source_gc_work(), which is the callback function of source_gc_wq
internally calls mod_delayed_work() again.
So, amt_source_gc_work() would be called after the amt module is removed.
Therefore kernel panic would occur.
In order to avoid it, cancel_delayed_work() should be used instead of
flush_delayed_work().

Test commands:
   modprobe amt
   modprobe -rv amt

Splat looks like:
 BUG: unable to handle page fault for address: fffffbfff80f50db
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 1237ee067 P4D 1237ee067 PUD 1237b2067 PMD 100c11067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0+ #27
 5a0ebebc29fe5c40c68bea90197606c3a832b09f
 RIP: 0010:run_timer_softirq+0x221/0xfc0
 Code: 00 00 4c 89 e1 4c 8b 30 48 c1 e9 03 80 3c 29 00 0f 85 ed 0b 00 00
 4d 89 34 24 4d 85 f6 74 19 49 8d 7e 08 48 89 f9 48 c1 e9 03 <80> 3c 29 00
 0f 85 fa 0b 00 00 4d 89 66 08 83 04 24 01 49 89 d4 48
 RSP: 0018:ffff888119009e50 EFLAGS: 00010806
 RAX: ffff8881191f8a80 RBX: 00000000007ffe2a RCX: 1ffffffff80f50db
 RDX: ffff888119009ed0 RSI: 0000000000000008 RDI: ffffffffc07a86d8
 RBP: dffffc0000000000 R08: ffff8881191f8280 R09: ffffed102323f061
 R10: ffff8881191f8307 R11: ffffed102323f060 R12: ffff888119009ec8
 R13: 00000000000000c0 R14: ffffffffc07a86d0 R15: ffff8881191f82e8
 FS:  0000000000000000(0000) GS:ffff888119000000(0000)
 knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: fffffbfff80f50db CR3: 00000001062dc002 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <IRQ>
  ? add_timer+0x650/0x650
  ? kvm_clock_read+0x14/0x30
  ? ktime_get+0xb9/0x180
  ? rcu_read_lock_held_common+0xe/0xa0
  ? rcu_read_lock_sched_held+0x56/0xc0
  ? rcu_read_lock_bh_held+0xa0/0xa0
  ? hrtimer_interrupt+0x271/0x790
  __do_softirq+0x1d0/0x88f
  irq_exit_rcu+0xe7/0x120
  sysvec_apic_timer_interrupt+0x8a/0xb0
  </IRQ>
  <TASK>
[ ... ]

Fixes: bc54e49c140b ("amt: add multicast(IGMP) report message handler")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
 drivers/net/amt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index c384b2694f9e..47a04c330885 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -3286,7 +3286,7 @@ static void __exit amt_fini(void)
 {
 	rtnl_link_unregister(&amt_link_ops);
 	unregister_netdevice_notifier(&amt_notifier_block);
-	flush_delayed_work(&source_gc_wq);
+	cancel_delayed_work(&source_gc_wq);
 	__amt_source_gc_work();
 	destroy_workqueue(amt_wq);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
  2021-11-08 14:53 [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini() Taehee Yoo
@ 2021-11-10  3:20 ` patchwork-bot+netdevbpf
  2021-11-11 15:37 ` Jakub Kicinski
  1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-10  3:20 UTC (permalink / raw)
  To: Taehee Yoo; +Cc: davem, kuba, netdev

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  8 Nov 2021 14:53:40 +0000 you wrote:
> When the amt module is being removed, it calls flush_delayed_work() to exit
> source_gc_wq. But it wouldn't be exited properly because the
> amt_source_gc_work(), which is the callback function of source_gc_wq
> internally calls mod_delayed_work() again.
> So, amt_source_gc_work() would be called after the amt module is removed.
> Therefore kernel panic would occur.
> In order to avoid it, cancel_delayed_work() should be used instead of
> flush_delayed_work().
> 
> [...]

Here is the summary with links:
  - [net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
    https://git.kernel.org/netdev/net/c/43aa4937994f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
  2021-11-08 14:53 [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini() Taehee Yoo
  2021-11-10  3:20 ` patchwork-bot+netdevbpf
@ 2021-11-11 15:37 ` Jakub Kicinski
  2021-11-12 12:17   ` Taehee Yoo
  1 sibling, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2021-11-11 15:37 UTC (permalink / raw)
  To: Taehee Yoo; +Cc: davem, netdev

On Mon,  8 Nov 2021 14:53:40 +0000 Taehee Yoo wrote:
> When the amt module is being removed, it calls flush_delayed_work() to exit
> source_gc_wq. But it wouldn't be exited properly because the
> amt_source_gc_work(), which is the callback function of source_gc_wq
> internally calls mod_delayed_work() again.
> So, amt_source_gc_work() would be called after the amt module is removed.
> Therefore kernel panic would occur.
> In order to avoid it, cancel_delayed_work() should be used instead of
> flush_delayed_work().

Somehow I convinced myself this is correct but now I'm not sure, again.

> diff --git a/drivers/net/amt.c b/drivers/net/amt.c
> index c384b2694f9e..47a04c330885 100644
> --- a/drivers/net/amt.c
> +++ b/drivers/net/amt.c
> @@ -3286,7 +3286,7 @@ static void __exit amt_fini(void)
>  {
>  	rtnl_link_unregister(&amt_link_ops);
>  	unregister_netdevice_notifier(&amt_notifier_block);
> -	flush_delayed_work(&source_gc_wq);
> +	cancel_delayed_work(&source_gc_wq);

This doesn't guarantee that the work is not running _right_ now and
will re-arm itself on the next clock cycle, so to speak.

 CPU 0                      CPU 1
 -----                      -----

 worker gets the work
 clears pending
 work starts running
                            cancel_work
                            grabs pending
                            clears pending
 mod_delayed_work()

You need cancel_delayed_work_sync(), right?

>  	__amt_source_gc_work();
>  	destroy_workqueue(amt_wq);
>  }


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini()
  2021-11-11 15:37 ` Jakub Kicinski
@ 2021-11-12 12:17   ` Taehee Yoo
  0 siblings, 0 replies; 4+ messages in thread
From: Taehee Yoo @ 2021-11-12 12:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev

Hi Jakub,
Thank you for your review!

On 11/12/21 12:37 AM, Jakub Kicinski wrote:
 > On Mon,  8 Nov 2021 14:53:40 +0000 Taehee Yoo wrote:
 >> When the amt module is being removed, it calls flush_delayed_work() 
to exit
 >> source_gc_wq. But it wouldn't be exited properly because the
 >> amt_source_gc_work(), which is the callback function of source_gc_wq
 >> internally calls mod_delayed_work() again.
 >> So, amt_source_gc_work() would be called after the amt module is 
removed.
 >> Therefore kernel panic would occur.
 >> In order to avoid it, cancel_delayed_work() should be used instead of
 >> flush_delayed_work().
 >
 > Somehow I convinced myself this is correct but now I'm not sure, again.
 >
 >> diff --git a/drivers/net/amt.c b/drivers/net/amt.c
 >> index c384b2694f9e..47a04c330885 100644
 >> --- a/drivers/net/amt.c
 >> +++ b/drivers/net/amt.c
 >> @@ -3286,7 +3286,7 @@ static void __exit amt_fini(void)
 >>   {
 >>   	rtnl_link_unregister(&amt_link_ops);
 >>   	unregister_netdevice_notifier(&amt_notifier_block);
 >> -	flush_delayed_work(&source_gc_wq);
 >> +	cancel_delayed_work(&source_gc_wq);
 >
 > This doesn't guarantee that the work is not running _right_ now and
 > will re-arm itself on the next clock cycle, so to speak.
 >
 >   CPU 0                      CPU 1
 >   -----                      -----
 >
 >   worker gets the work
 >   clears pending
 >   work starts running
 >                              cancel_work
 >                              grabs pending
 >                              clears pending
 >   mod_delayed_work()
 >
 > You need cancel_delayed_work_sync(), right?
 >

you're right!
I think cancel_delayed_work() is async so that it can't clearly fix this 
problem.
So, I will send a new patch after some tests.
Thank you so much for catching it!

Thanks a lot,
Taehee

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-12 12:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-08 14:53 [PATCH net] amt: use cancel_delayed_work() instead of flush_delayed_work() in amt_fini() Taehee Yoo
2021-11-10  3:20 ` patchwork-bot+netdevbpf
2021-11-11 15:37 ` Jakub Kicinski
2021-11-12 12:17   ` Taehee Yoo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.