* [Intel-wired-lan] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") @ 2023-03-07 9:54 Corinna Vinschen 2023-03-07 10:12 ` [Intel-wired-lan] [REGRESSION] " Paul Menzel 0 siblings, 1 reply; 16+ messages in thread From: Corinna Vinschen @ 2023-03-07 9:54 UTC (permalink / raw) To: Lin Ma, intel-wired-lan Hi, After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see a deadlock scenario when trying to unload the igb module. The reproducer is pretty simple: # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs # modprobe -r igb The hang is quite thorough, I assume it's suffering a deadlock between the rtnl_lock and the spinlock introduced by 6faee3d4ee8b. Any chance you could have a look? Thanks, Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") 2023-03-07 9:54 [Intel-wired-lan] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") Corinna Vinschen @ 2023-03-07 10:12 ` Paul Menzel 2023-03-07 10:19 ` Paul Menzel 0 siblings, 1 reply; 16+ messages in thread From: Paul Menzel @ 2023-03-07 10:12 UTC (permalink / raw) To: Lin Ma; +Cc: intel-wired-lan, regressions [Adding regressions@lists.linux.dev] #regzbot ^introduced: 6faee3d4ee8b Am 07.03.23 um 10:54 schrieb Corinna Vinschen: > Hi, > > > After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see > a deadlock scenario when trying to unload the igb module. > > The reproducer is pretty simple: > > # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs > # modprobe -r igb > > The hang is quite thorough, I assume it's suffering a deadlock between > the rtnl_lock and the spinlock introduced by 6faee3d4ee8b. > > Any chance you could have a look? > > > Thanks, > Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") 2023-03-07 10:12 ` [Intel-wired-lan] [REGRESSION] " Paul Menzel @ 2023-03-07 10:19 ` Paul Menzel 2023-03-07 10:36 ` Lin Ma 2023-03-07 11:48 ` Lin Ma 0 siblings, 2 replies; 16+ messages in thread From: Paul Menzel @ 2023-03-07 10:19 UTC (permalink / raw) To: Lin Ma, Corinna Vinschen; +Cc: intel-wired-lan, regressions [Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because Reply-To header was set for some reason in the original report.] Am 07.03.23 um 11:12 schrieb Paul Menzel: > [Adding regressions@lists.linux.dev] > > #regzbot ^introduced: 6faee3d4ee8b > > Am 07.03.23 um 10:54 schrieb Corinna Vinschen: >> Hi, >> >> >> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see >> a deadlock scenario when trying to unload the igb module. >> >> The reproducer is pretty simple: >> >> # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs >> # modprobe -r igb >> >> The hang is quite thorough, I assume it's suffering a deadlock between >> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b. >> >> Any chance you could have a look? >> >> >> Thanks, >> Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") 2023-03-07 10:19 ` Paul Menzel @ 2023-03-07 10:36 ` Lin Ma 2023-03-07 12:11 ` Corinna Vinschen 2023-03-07 11:48 ` Lin Ma 1 sibling, 1 reply; 16+ messages in thread From: Lin Ma @ 2023-03-07 10:36 UTC (permalink / raw) To: Paul Menzel; +Cc: intel-wired-lan, Corinna Vinschen, regressions Hello there Yeah I am looking at it. Could you please offer the crash log or locking debug message if accessible? Thanks in advance. Regards Lin >[Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because >Reply-To header was set for some reason in the original report.] > >Am 07.03.23 um 11:12 schrieb Paul Menzel: >> [Adding regressions@lists.linux.dev] >> >> #regzbot ^introduced: 6faee3d4ee8b >> >> Am 07.03.23 um 10:54 schrieb Corinna Vinschen: >>> Hi, >>> >>> >>> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see >>> a deadlock scenario when trying to unload the igb module. >>> >>> The reproducer is pretty simple: >>> >>> # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs >>> # modprobe -r igb >>> >>> The hang is quite thorough, I assume it's suffering a deadlock between >>> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b. >>> >>> Any chance you could have a look? >>> >>> >>> Thanks, >>> Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") 2023-03-07 10:36 ` Lin Ma @ 2023-03-07 12:11 ` Corinna Vinschen 2023-03-07 12:37 ` Lin Ma 0 siblings, 1 reply; 16+ messages in thread From: Corinna Vinschen @ 2023-03-07 12:11 UTC (permalink / raw) To: Lin Ma; +Cc: Paul Menzel, intel-wired-lan, regressions Hi Lin, On Mar 7 18:36, Lin Ma wrote: > Hello there > > Yeah I am looking at it. Could you please offer the crash log or > locking debug message if accessible? Thanks in advance. The commands used to reroduce and the resulting log message: # echo 10 > /proc/sys/kernel/hung_task_timeout_secs # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs # modprobe -r igb [hang] console log: [ 116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000 [ 116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000 [ 116.917013] igb 0000:84:00.2: 2 VFs allocated [ 116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver [ 116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation. [ 116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002) [ 116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up? [ 116.983196] igbvf 0000:84:10.2: Assigning random MAC address. [ 116.985058] igbvf 0000:84:10.2: PF still resetting [ 117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function [ 117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00 [ 117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002) [ 117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up? [ 117.024724] igbvf 0000:84:10.6: Assigning random MAC address. [ 117.036215] igbvf 0000:84:10.6: PF still resetting [ 117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function [ 117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c [ 117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1 [ 117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0 [ 117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter [ 117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter [ 117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter [ 127.361316] igb 0000:84:00.3: removed PHC on ens5f3 [ 127.361975] igb 0000:84:00.3: DCA disabled [ 127.483085] igb 0000:84:00.2: removed PHC on ens5f2 [ 127.483786] igb 0000:84:00.2: DCA disabled [ 141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds. [ 141.418856] Not tainted 6.2.0-rc8+ #12 [ 141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 141.419616] task:modprobe state:D stack:0 pid:2078 ppid:2037 flags:0x00004000 [ 141.420039] Call Trace: [ 141.420169] <TASK> [ 141.420672] __schedule+0x2dd/0x840 [ 141.421427] schedule+0x50/0xc0 [ 141.422041] schedule_preempt_disabled+0x11/0x20 [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 [ 141.423324] unregister_netdev+0xe/0x20 [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] [ 141.423791] pci_device_remove+0x36/0xb0 [ 141.423990] device_release_driver_internal+0xc1/0x160 [ 141.424270] pci_stop_bus_device+0x6d/0x90 [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 [ 141.425452] sriov_disable+0x2f/0xf0 [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] [ 141.426353] igb_remove+0xa0/0x130 [igb] [ 141.426599] pci_device_remove+0x36/0xb0 [ 141.426796] device_release_driver_internal+0xc1/0x160 [ 141.427060] driver_detach+0x44/0x90 [ 141.427253] bus_remove_driver+0x55/0xe0 [ 141.427477] pci_unregister_driver+0x2a/0xa0 [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 [ 141.429126] ? mntput_no_expire+0x4a/0x240 [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 [ 141.429653] do_syscall_64+0x5b/0x80 [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.430849] ? do_syscall_64+0x67/0x80 [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.432482] ? do_syscall_64+0x67/0x80 [ 141.432714] ? exc_page_fault+0x64/0x140 [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 141.433175] RIP: 0033:0x7ff04cc3a05b [ 141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b [ 141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68 [ 141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000 [ 141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000 [ 141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000 [ 141.438442] </TASK> Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") 2023-03-07 12:11 ` Corinna Vinschen @ 2023-03-07 12:37 ` Lin Ma 2023-03-07 12:55 ` Corinna Vinschen 0 siblings, 1 reply; 16+ messages in thread From: Lin Ma @ 2023-03-07 12:37 UTC (permalink / raw) To: intel-wired-lan; +Cc: Paul Menzel, regressions Hi Corinna, Thanks for the crash log. Seems the reason I didn't successfully reproduce the bug is that I didn't actually enable the sriov. According to the log, the deadlock seems obvious. I'm soooo noob to make such a mistake. > [ 141.423324] unregister_netdev+0xe/0x20 // <===== again grant rtnl_lock > [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] > [ 141.423791] pci_device_remove+0x36/0xb0 > [ 141.423990] device_release_driver_internal+0xc1/0x160 > [ 141.424270] pci_stop_bus_device+0x6d/0x90 > [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 > [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 > [ 141.425452] sriov_disable+0x2f/0xf0 > [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] > [ 141.426353] igb_remove+0xa0/0x130 [igb] // <===== first grant rtnl_lock I will prepare the commit to revert this buggy commit and I will add the Report-by tag for you. Regards Lin > From: "Corinna Vinschen" <vinschen@redhat.com> > Sent Time: 2023-03-07 20:11:46 (Tuesday) > To: "Lin Ma" <linma@zju.edu.cn> > Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, regressions <regressions@lists.linux.dev> > Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") > > Hi Lin, > > On Mar 7 18:36, Lin Ma wrote: > > Hello there > > > > Yeah I am looking at it. Could you please offer the crash log or > > locking debug message if accessible? Thanks in advance. > > The commands used to reroduce and the resulting log message: > > # echo 10 > /proc/sys/kernel/hung_task_timeout_secs > # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs > # modprobe -r igb > [hang] > > console log: > > [ 116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000 > [ 116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000 > [ 116.917013] igb 0000:84:00.2: 2 VFs allocated > [ 116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver > [ 116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation. > [ 116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002) > [ 116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up? > [ 116.983196] igbvf 0000:84:10.2: Assigning random MAC address. > [ 116.985058] igbvf 0000:84:10.2: PF still resetting > [ 117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function > [ 117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00 > [ 117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002) > [ 117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up? > [ 117.024724] igbvf 0000:84:10.6: Assigning random MAC address. > [ 117.036215] igbvf 0000:84:10.6: PF still resetting > [ 117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function > [ 117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c > [ 117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1 > [ 117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0 > [ 117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter > [ 117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter > [ 117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter > [ 127.361316] igb 0000:84:00.3: removed PHC on ens5f3 > [ 127.361975] igb 0000:84:00.3: DCA disabled > [ 127.483085] igb 0000:84:00.2: removed PHC on ens5f2 > [ 127.483786] igb 0000:84:00.2: DCA disabled > [ 141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds. > [ 141.418856] Not tainted 6.2.0-rc8+ #12 > [ 141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 141.419616] task:modprobe state:D stack:0 pid:2078 ppid:2037 flags:0x00004000 > [ 141.420039] Call Trace: > [ 141.420169] <TASK> > [ 141.420672] __schedule+0x2dd/0x840 > [ 141.421427] schedule+0x50/0xc0 > [ 141.422041] schedule_preempt_disabled+0x11/0x20 > [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 > [ 141.423324] unregister_netdev+0xe/0x20 > [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] > [ 141.423791] pci_device_remove+0x36/0xb0 > [ 141.423990] device_release_driver_internal+0xc1/0x160 > [ 141.424270] pci_stop_bus_device+0x6d/0x90 > [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 > [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 > [ 141.425452] sriov_disable+0x2f/0xf0 > [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] > [ 141.426353] igb_remove+0xa0/0x130 [igb] > [ 141.426599] pci_device_remove+0x36/0xb0 > [ 141.426796] device_release_driver_internal+0xc1/0x160 > [ 141.427060] driver_detach+0x44/0x90 > [ 141.427253] bus_remove_driver+0x55/0xe0 > [ 141.427477] pci_unregister_driver+0x2a/0xa0 > [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 > [ 141.429126] ? mntput_no_expire+0x4a/0x240 > [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 > [ 141.429653] do_syscall_64+0x5b/0x80 > [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 > [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 > [ 141.430849] ? do_syscall_64+0x67/0x80 > [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 > [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 > [ 141.432482] ? do_syscall_64+0x67/0x80 > [ 141.432714] ? exc_page_fault+0x64/0x140 > [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [ 141.433175] RIP: 0033:0x7ff04cc3a05b > [ 141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 > [ 141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b > [ 141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68 > [ 141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000 > [ 141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000 > [ 141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000 > [ 141.438442] </TASK> > > > Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") 2023-03-07 12:37 ` Lin Ma @ 2023-03-07 12:55 ` Corinna Vinschen 0 siblings, 0 replies; 16+ messages in thread From: Corinna Vinschen @ 2023-03-07 12:55 UTC (permalink / raw) To: Lin Ma; +Cc: Paul Menzel, intel-wired-lan, regressions Hi Lin, On Mar 7 20:37, Lin Ma wrote: > Hi Corinna, > > Thanks for the crash log. Seems the reason I didn't successfully reproduce the bug is that I didn't actually enable the sriov. > According to the log, the deadlock seems obvious. I'm soooo noob to make such a mistake. > > > [ 141.423324] unregister_netdev+0xe/0x20 // <===== again grant rtnl_lock > > [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] > > [ 141.423791] pci_device_remove+0x36/0xb0 > > [ 141.423990] device_release_driver_internal+0xc1/0x160 > > [ 141.424270] pci_stop_bus_device+0x6d/0x90 > > [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 > > [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 > > [ 141.425452] sriov_disable+0x2f/0xf0 > > [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] > > [ 141.426353] igb_remove+0xa0/0x130 [igb] // <===== first grant rtnl_lock > > I will prepare the commit to revert this buggy commit and I will add > the Report-by tag for you. Great, thanks a lot! Corinna > Regards > Lin > > > > From: "Corinna Vinschen" <vinschen@redhat.com> > > Sent Time: 2023-03-07 20:11:46 (Tuesday) > > To: "Lin Ma" <linma@zju.edu.cn> > > Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, regressions <regressions@lists.linux.dev> > > Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") > > > > Hi Lin, > > > > On Mar 7 18:36, Lin Ma wrote: > > > Hello there > > > > > > Yeah I am looking at it. Could you please offer the crash log or > > > locking debug message if accessible? Thanks in advance. > > > > The commands used to reroduce and the resulting log message: > > > > # echo 10 > /proc/sys/kernel/hung_task_timeout_secs > > # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs > > # modprobe -r igb > > [hang] > > > > console log: > > > > [ 116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000 > > [ 116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000 > > [ 116.917013] igb 0000:84:00.2: 2 VFs allocated > > [ 116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver > > [ 116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation. > > [ 116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002) > > [ 116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up? > > [ 116.983196] igbvf 0000:84:10.2: Assigning random MAC address. > > [ 116.985058] igbvf 0000:84:10.2: PF still resetting > > [ 117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function > > [ 117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00 > > [ 117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002) > > [ 117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up? > > [ 117.024724] igbvf 0000:84:10.6: Assigning random MAC address. > > [ 117.036215] igbvf 0000:84:10.6: PF still resetting > > [ 117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function > > [ 117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c > > [ 117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1 > > [ 117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0 > > [ 117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter > > [ 117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter > > [ 117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter > > [ 127.361316] igb 0000:84:00.3: removed PHC on ens5f3 > > [ 127.361975] igb 0000:84:00.3: DCA disabled > > [ 127.483085] igb 0000:84:00.2: removed PHC on ens5f2 > > [ 127.483786] igb 0000:84:00.2: DCA disabled > > [ 141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds. > > [ 141.418856] Not tainted 6.2.0-rc8+ #12 > > [ 141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 141.419616] task:modprobe state:D stack:0 pid:2078 ppid:2037 flags:0x00004000 > > [ 141.420039] Call Trace: > > [ 141.420169] <TASK> > > [ 141.420672] __schedule+0x2dd/0x840 > > [ 141.421427] schedule+0x50/0xc0 > > [ 141.422041] schedule_preempt_disabled+0x11/0x20 > > [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 > > [ 141.423324] unregister_netdev+0xe/0x20 > > [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] > > [ 141.423791] pci_device_remove+0x36/0xb0 > > [ 141.423990] device_release_driver_internal+0xc1/0x160 > > [ 141.424270] pci_stop_bus_device+0x6d/0x90 > > [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 > > [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 > > [ 141.425452] sriov_disable+0x2f/0xf0 > > [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] > > [ 141.426353] igb_remove+0xa0/0x130 [igb] > > [ 141.426599] pci_device_remove+0x36/0xb0 > > [ 141.426796] device_release_driver_internal+0xc1/0x160 > > [ 141.427060] driver_detach+0x44/0x90 > > [ 141.427253] bus_remove_driver+0x55/0xe0 > > [ 141.427477] pci_unregister_driver+0x2a/0xa0 > > [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 > > [ 141.429126] ? mntput_no_expire+0x4a/0x240 > > [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 > > [ 141.429653] do_syscall_64+0x5b/0x80 > > [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 > > [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 > > [ 141.430849] ? do_syscall_64+0x67/0x80 > > [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 > > [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 > > [ 141.432482] ? do_syscall_64+0x67/0x80 > > [ 141.432714] ? exc_page_fault+0x64/0x140 > > [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > [ 141.433175] RIP: 0033:0x7ff04cc3a05b > > [ 141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 > > [ 141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b > > [ 141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68 > > [ 141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000 > > [ 141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000 > > [ 141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000 > > [ 141.438442] </TASK> > > > > > > Corinna > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") 2023-03-07 10:19 ` Paul Menzel 2023-03-07 10:36 ` Lin Ma @ 2023-03-07 11:48 ` Lin Ma 2023-03-07 13:05 ` [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma 1 sibling, 1 reply; 16+ messages in thread From: Lin Ma @ 2023-03-07 11:48 UTC (permalink / raw) To: Paul Menzel; +Cc: intel-wired-lan, Corinna Vinschen, regressions Hello Paul and Corinna, Since I didn't have the exact ens5f2 driver as you guys, I just test the module loading and unloading in my QEMU environment. I turned on the CONFIG_PROVE_LOCKING option but failed to reproduce the deadlock. root@syzkaller:/# insmod igb.ko [ 116.192730] igb: Intel(R) Gigabit Ethernet Network Driver [ 116.193128] igb: Copyright (c) 2007-2014 Intel Corporation. root@syzkaller:/# lsmod Module Size Used by igb 225280 0 root@syzkaller:/# rmmod igb rmmod: ERROR: ../libkmod/libkmod.c:514 lookup_builtin_file() could not open builtin file '/lib/modules/6.1.0/modules.builtin.bin' root@syzkaller:/# lsmod lsmod Module Size Used by I guess the command "echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs" must do something interesting but I didn't find the relevant code there. Could you guys take a look? Regards Lin > From: "Paul Menzel" <pmenzel@molgen.mpg.de> > Sent Time: 2023-03-07 18:19:18 (Tuesday) > To: "Lin Ma" <linma@zju.edu.cn>, "Corinna Vinschen" <vinschen@redhat.com> > Cc: intel-wired-lan@lists.osuosl.org, regressions@lists.linux.dev > Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") > > [Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because > Reply-To header was set for some reason in the original report.] > > Am 07.03.23 um 11:12 schrieb Paul Menzel: > > [Adding regressions@lists.linux.dev] > > > > #regzbot ^introduced: 6faee3d4ee8b > > > > Am 07.03.23 um 10:54 schrieb Corinna Vinschen: > >> Hi, > >> > >> > >> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see > >> a deadlock scenario when trying to unload the igb module. > >> > >> The reproducer is pretty simple: > >> > >> # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs > >> # modprobe -r igb > >> > >> The hang is quite thorough, I assume it's suffering a deadlock between > >> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b. > >> > >> Any chance you could have a look? > >> > >> > >> Thanks, > >> Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock 2023-03-07 11:48 ` Lin Ma @ 2023-03-07 13:05 ` Lin Ma 2023-03-07 13:17 ` Linux regression tracking (Thorsten Leemhuis) 2023-03-07 13:45 ` Corinna Vinschen 0 siblings, 2 replies; 16+ messages in thread From: Lin Ma @ 2023-03-07 13:05 UTC (permalink / raw) To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba, pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk, john.fastabend Cc: pmenzel, vinschen, regressions, stable, intel-wired-lan, Lin Ma The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds rtnl_lock to eliminate a false data race shown below (FREE from device detaching) | (USE from netdev core) igb_remove | igb_ndo_get_vf_config igb_disable_sriov | vf >= adapter->vfs_allocated_count? kfree(adapter->vf_data) | adapter->vfs_allocated_count = 0 | | memcpy(... adapter->vf_data[vf] The above race will never happen and the extra rtnl_lock causes deadlock below [ 141.420169] <TASK> [ 141.420672] __schedule+0x2dd/0x840 [ 141.421427] schedule+0x50/0xc0 [ 141.422041] schedule_preempt_disabled+0x11/0x20 [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 [ 141.423324] unregister_netdev+0xe/0x20 [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] [ 141.423791] pci_device_remove+0x36/0xb0 [ 141.423990] device_release_driver_internal+0xc1/0x160 [ 141.424270] pci_stop_bus_device+0x6d/0x90 [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 [ 141.425452] sriov_disable+0x2f/0xf0 [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] [ 141.426353] igb_remove+0xa0/0x130 [igb] [ 141.426599] pci_device_remove+0x36/0xb0 [ 141.426796] device_release_driver_internal+0xc1/0x160 [ 141.427060] driver_detach+0x44/0x90 [ 141.427253] bus_remove_driver+0x55/0xe0 [ 141.427477] pci_unregister_driver+0x2a/0xa0 [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 [ 141.429126] ? mntput_no_expire+0x4a/0x240 [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 [ 141.429653] do_syscall_64+0x5b/0x80 [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.430849] ? do_syscall_64+0x67/0x80 [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.432482] ? do_syscall_64+0x67/0x80 [ 141.432714] ? exc_page_fault+0x64/0x140 [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc Since the igb_disable_sriov() will call pci_disable_sriov() before releasing any resources, the netdev core will synchronize the cleanup to avoid any races. This patch removes the useless rtnl_(un)lock to guarantee correctness. CC: stable@vger.kernel.org Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") Reported-by: Corinna <vinschen@redhat.com> Link: https://lore.kernel.org/regressions/3ef31c0b-ce40-20d0-7740-5dc0cca278ca@molgen.mpg.de/ Signed-off-by: Lin Ma <linma@zju.edu.cn> --- drivers/net/ethernet/intel/igb/igb_main.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 03bc1e8af575..5532361b0e94 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev) igb_release_hw_control(adapter); #ifdef CONFIG_PCI_IOV - rtnl_lock(); igb_disable_sriov(pdev); - rtnl_unlock(); #endif unregister_netdev(netdev); -- 2.34.1 _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock 2023-03-07 13:05 ` [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma @ 2023-03-07 13:17 ` Linux regression tracking (Thorsten Leemhuis) 2023-03-07 13:45 ` Corinna Vinschen 1 sibling, 0 replies; 16+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-03-07 13:17 UTC (permalink / raw) To: Lin Ma, jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba, pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk, john.fastabend Cc: pmenzel, vinschen, intel-wired-lan, regressions, stable On 07.03.23 14:05, Lin Ma wrote: > The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds > rtnl_lock to eliminate a false data race shown below > > (FREE from device detaching) | (USE from netdev core) > igb_remove | igb_ndo_get_vf_config > igb_disable_sriov | vf >= adapter->vfs_allocated_count? > kfree(adapter->vf_data) | > adapter->vfs_allocated_count = 0 | > | memcpy(... adapter->vf_data[vf] > > [...] > CC: stable@vger.kernel.org > Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") > Reported-by: Corinna <vinschen@redhat.com> > Link: https://lore.kernel.org/regressions/3ef31c0b-ce40-20d0-7740-5dc0cca278ca@molgen.mpg.de/ FWIW, that afaics should be: Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/ (that's the parent of the mail above) > Signed-off-by: Lin Ma <linma@zju.edu.cn> Ciao, Thorsten _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock 2023-03-07 13:05 ` [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma 2023-03-07 13:17 ` Linux regression tracking (Thorsten Leemhuis) @ 2023-03-07 13:45 ` Corinna Vinschen 2023-03-07 15:29 ` [Intel-wired-lan] [PATCH v2] " Lin Ma 1 sibling, 1 reply; 16+ messages in thread From: Corinna Vinschen @ 2023-03-07 13:45 UTC (permalink / raw) To: Lin Ma Cc: pmenzel, ast, hawk, daniel, netdev, richardcochran, john.fastabend, jesse.brandeburg, stable, linux-kernel, edumazet, anthony.l.nguyen, intel-wired-lan, kuba, pabeni, davem, regressions On Mar 7 21:05, Lin Ma wrote: > The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds > rtnl_lock to eliminate a false data race shown below > > (FREE from device detaching) | (USE from netdev core) > igb_remove | igb_ndo_get_vf_config > igb_disable_sriov | vf >= adapter->vfs_allocated_count? > kfree(adapter->vf_data) | > adapter->vfs_allocated_count = 0 | > | memcpy(... adapter->vf_data[vf] > > The above race will never happen and the extra rtnl_lock causes deadlock > below > [...] > CC: stable@vger.kernel.org > Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") > Reported-by: Corinna <vinschen@redhat.com> Thank you, but "Corinna Vinschen", please. Thanks, Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock 2023-03-07 13:45 ` Corinna Vinschen @ 2023-03-07 15:29 ` Lin Ma 2023-03-07 16:27 ` Corinna Vinschen ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Lin Ma @ 2023-03-07 15:29 UTC (permalink / raw) To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba, pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk, john.fastabend Cc: pmenzel, vinschen, regressions, stable, intel-wired-lan, Lin Ma The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds rtnl_lock to eliminate a false data race shown below (FREE from device detaching) | (USE from netdev core) igb_remove | igb_ndo_get_vf_config igb_disable_sriov | vf >= adapter->vfs_allocated_count? kfree(adapter->vf_data) | adapter->vfs_allocated_count = 0 | | memcpy(... adapter->vf_data[vf] The above race will never happen and the extra rtnl_lock causes deadlock below [ 141.420169] <TASK> [ 141.420672] __schedule+0x2dd/0x840 [ 141.421427] schedule+0x50/0xc0 [ 141.422041] schedule_preempt_disabled+0x11/0x20 [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 [ 141.423324] unregister_netdev+0xe/0x20 [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] [ 141.423791] pci_device_remove+0x36/0xb0 [ 141.423990] device_release_driver_internal+0xc1/0x160 [ 141.424270] pci_stop_bus_device+0x6d/0x90 [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 [ 141.425452] sriov_disable+0x2f/0xf0 [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] [ 141.426353] igb_remove+0xa0/0x130 [igb] [ 141.426599] pci_device_remove+0x36/0xb0 [ 141.426796] device_release_driver_internal+0xc1/0x160 [ 141.427060] driver_detach+0x44/0x90 [ 141.427253] bus_remove_driver+0x55/0xe0 [ 141.427477] pci_unregister_driver+0x2a/0xa0 [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 [ 141.429126] ? mntput_no_expire+0x4a/0x240 [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 [ 141.429653] do_syscall_64+0x5b/0x80 [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.430849] ? do_syscall_64+0x67/0x80 [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 [ 141.432482] ? do_syscall_64+0x67/0x80 [ 141.432714] ? exc_page_fault+0x64/0x140 [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc Since the igb_disable_sriov() will call pci_disable_sriov() before releasing any resources, the netdev core will synchronize the cleanup to avoid any races. This patch removes the useless rtnl_(un)lock to guarantee correctness. CC: stable@vger.kernel.org Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") Reported-by: Corinna Vinschen <vinschen@redhat.com> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/ Signed-off-by: Lin Ma <linma@zju.edu.cn> --- V1->V2: update Link and correct Corinna's name drivers/net/ethernet/intel/igb/igb_main.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 03bc1e8af575..5532361b0e94 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev) igb_release_hw_control(adapter); #ifdef CONFIG_PCI_IOV - rtnl_lock(); igb_disable_sriov(pdev); - rtnl_unlock(); #endif unregister_netdev(netdev); -- 2.34.1 _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock 2023-03-07 15:29 ` [Intel-wired-lan] [PATCH v2] " Lin Ma @ 2023-03-07 16:27 ` Corinna Vinschen 2023-03-07 23:22 ` Jacob Keller ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: Corinna Vinschen @ 2023-03-07 16:27 UTC (permalink / raw) To: Lin Ma Cc: pmenzel, ast, hawk, daniel, netdev, richardcochran, john.fastabend, jesse.brandeburg, stable, linux-kernel, edumazet, anthony.l.nguyen, intel-wired-lan, kuba, pabeni, davem, regressions On Mar 7 23:29, Lin Ma wrote: > The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds > rtnl_lock to eliminate a false data race shown below > > (FREE from device detaching) | (USE from netdev core) > igb_remove | igb_ndo_get_vf_config > igb_disable_sriov | vf >= adapter->vfs_allocated_count? > kfree(adapter->vf_data) | > adapter->vfs_allocated_count = 0 | > | memcpy(... adapter->vf_data[vf] > > The above race will never happen and the extra rtnl_lock causes deadlock > below > > [ 141.420169] <TASK> > [ 141.420672] __schedule+0x2dd/0x840 > [ 141.421427] schedule+0x50/0xc0 > [ 141.422041] schedule_preempt_disabled+0x11/0x20 > [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 > [ 141.423324] unregister_netdev+0xe/0x20 > [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] > [ 141.423791] pci_device_remove+0x36/0xb0 > [ 141.423990] device_release_driver_internal+0xc1/0x160 > [ 141.424270] pci_stop_bus_device+0x6d/0x90 > [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 > [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 > [ 141.425452] sriov_disable+0x2f/0xf0 > [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] > [ 141.426353] igb_remove+0xa0/0x130 [igb] > [ 141.426599] pci_device_remove+0x36/0xb0 > [ 141.426796] device_release_driver_internal+0xc1/0x160 > [ 141.427060] driver_detach+0x44/0x90 > [ 141.427253] bus_remove_driver+0x55/0xe0 > [ 141.427477] pci_unregister_driver+0x2a/0xa0 > [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 > [ 141.429126] ? mntput_no_expire+0x4a/0x240 > [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 > [ 141.429653] do_syscall_64+0x5b/0x80 > [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 > [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 > [ 141.430849] ? do_syscall_64+0x67/0x80 > [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 > [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 > [ 141.432482] ? do_syscall_64+0x67/0x80 > [ 141.432714] ? exc_page_fault+0x64/0x140 > [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Since the igb_disable_sriov() will call pci_disable_sriov() before > releasing any resources, the netdev core will synchronize the cleanup to > avoid any races. This patch removes the useless rtnl_(un)lock to guarantee > correctness. > > CC: stable@vger.kernel.org > Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") > Reported-by: Corinna Vinschen <vinschen@redhat.com> > Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/ > Signed-off-by: Lin Ma <linma@zju.edu.cn> Tested-by: Corinna Vinschen <vinschen@redhat.com> Thanks, Corinna _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock 2023-03-07 15:29 ` [Intel-wired-lan] [PATCH v2] " Lin Ma 2023-03-07 16:27 ` Corinna Vinschen @ 2023-03-07 23:22 ` Jacob Keller 2023-03-08 12:04 ` Simon Horman 2023-03-16 9:10 ` Romanowski, Rafal 3 siblings, 0 replies; 16+ messages in thread From: Jacob Keller @ 2023-03-07 23:22 UTC (permalink / raw) To: intel-wired-lan On 3/7/2023 7:29 AM, Lin Ma wrote: > The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds > rtnl_lock to eliminate a false data race shown below > This patch does a number of other changes as well, including its own spinlock, so this is only a partial revert. Ok. > (FREE from device detaching) | (USE from netdev core) > igb_remove | igb_ndo_get_vf_config > igb_disable_sriov | vf >= adapter->vfs_allocated_count? > kfree(adapter->vf_data) | > adapter->vfs_allocated_count = 0 | > | memcpy(... adapter->vf_data[vf] > How was this data race detected? KCSAN? > The above race will never happen and the extra rtnl_lock causes deadlock > below So this was a theoretical data race but in practice hasn't occurred? > > [ 141.420169] <TASK> > [ 141.420672] __schedule+0x2dd/0x840 > [ 141.421427] schedule+0x50/0xc0 > [ 141.422041] schedule_preempt_disabled+0x11/0x20 > [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 > [ 141.423324] unregister_netdev+0xe/0x20 > [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] > [ 141.423791] pci_device_remove+0x36/0xb0 > [ 141.423990] device_release_driver_internal+0xc1/0x160 > [ 141.424270] pci_stop_bus_device+0x6d/0x90 > [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 > [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 > [ 141.425452] sriov_disable+0x2f/0xf0 > [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] > [ 141.426353] igb_remove+0xa0/0x130 [igb] > [ 141.426599] pci_device_remove+0x36/0xb0 > [ 141.426796] device_release_driver_internal+0xc1/0x160 > [ 141.427060] driver_detach+0x44/0x90 > [ 141.427253] bus_remove_driver+0x55/0xe0 > [ 141.427477] pci_unregister_driver+0x2a/0xa0 > [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 > [ 141.429126] ? mntput_no_expire+0x4a/0x240 > [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 > [ 141.429653] do_syscall_64+0x5b/0x80 > [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 > [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 > [ 141.430849] ? do_syscall_64+0x67/0x80 > [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 > [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 > [ 141.432482] ? do_syscall_64+0x67/0x80 > [ 141.432714] ? exc_page_fault+0x64/0x140 > [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Since the igb_disable_sriov() will call pci_disable_sriov() before > releasing any resources, the netdev core will synchronize the cleanup to > avoid any races. This patch removes the useless rtnl_(un)lock to guarantee > correctness. > you could say "harmful" rather than "useless" here. igb is holding the RTNL lock while calling igb_disable_sriov which then tries to remove the VF driver which then takes the RTNL lock and deadlocks. This was possibly not caught before because if the igbvf device was in a VM, this would not deadlock since the VF driver would not be loaded in host, and would thus not acquire the host RTNL lock. Ok. This makes sense how it might not have been caught before. The original commit referenced that the RTNL lock was added similar to 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink") so I wanted to check if that might also be buggy. Turns out no, its actually seemingly unrelated to this broken use of RTNL lock as its holding something for disconnecting a phylink object, but has nothing to do with SR-IOV. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Thanks, Jake > CC: stable@vger.kernel.org > Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") > Reported-by: Corinna Vinschen <vinschen@redhat.com> > Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/ > Signed-off-by: Lin Ma <linma@zju.edu.cn> > --- > V1->V2: update Link and correct Corinna's name > drivers/net/ethernet/intel/igb/igb_main.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c > index 03bc1e8af575..5532361b0e94 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev) > igb_release_hw_control(adapter); > > #ifdef CONFIG_PCI_IOV > - rtnl_lock(); > igb_disable_sriov(pdev); > - rtnl_unlock(); > #endif > > unregister_netdev(netdev); _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock 2023-03-07 15:29 ` [Intel-wired-lan] [PATCH v2] " Lin Ma 2023-03-07 16:27 ` Corinna Vinschen 2023-03-07 23:22 ` Jacob Keller @ 2023-03-08 12:04 ` Simon Horman 2023-03-16 9:10 ` Romanowski, Rafal 3 siblings, 0 replies; 16+ messages in thread From: Simon Horman @ 2023-03-08 12:04 UTC (permalink / raw) To: Lin Ma Cc: pmenzel, ast, hawk, daniel, netdev, richardcochran, john.fastabend, jesse.brandeburg, stable, linux-kernel, vinschen, edumazet, anthony.l.nguyen, intel-wired-lan, kuba, pabeni, davem, regressions On Tue, Mar 07, 2023 at 11:29:17PM +0800, Lin Ma wrote: > The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds > rtnl_lock to eliminate a false data race shown below > > (FREE from device detaching) | (USE from netdev core) > igb_remove | igb_ndo_get_vf_config > igb_disable_sriov | vf >= adapter->vfs_allocated_count? > kfree(adapter->vf_data) | > adapter->vfs_allocated_count = 0 | > | memcpy(... adapter->vf_data[vf] > > The above race will never happen and the extra rtnl_lock causes deadlock > below > > [ 141.420169] <TASK> > [ 141.420672] __schedule+0x2dd/0x840 > [ 141.421427] schedule+0x50/0xc0 > [ 141.422041] schedule_preempt_disabled+0x11/0x20 > [ 141.422678] __mutex_lock.isra.13+0x431/0x6b0 > [ 141.423324] unregister_netdev+0xe/0x20 > [ 141.423578] igbvf_remove+0x45/0xe0 [igbvf] > [ 141.423791] pci_device_remove+0x36/0xb0 > [ 141.423990] device_release_driver_internal+0xc1/0x160 > [ 141.424270] pci_stop_bus_device+0x6d/0x90 > [ 141.424507] pci_stop_and_remove_bus_device+0xe/0x20 > [ 141.424789] pci_iov_remove_virtfn+0xba/0x120 > [ 141.425452] sriov_disable+0x2f/0xf0 > [ 141.425679] igb_disable_sriov+0x4e/0x100 [igb] > [ 141.426353] igb_remove+0xa0/0x130 [igb] > [ 141.426599] pci_device_remove+0x36/0xb0 > [ 141.426796] device_release_driver_internal+0xc1/0x160 > [ 141.427060] driver_detach+0x44/0x90 > [ 141.427253] bus_remove_driver+0x55/0xe0 > [ 141.427477] pci_unregister_driver+0x2a/0xa0 > [ 141.428296] __x64_sys_delete_module+0x141/0x2b0 > [ 141.429126] ? mntput_no_expire+0x4a/0x240 > [ 141.429363] ? syscall_trace_enter.isra.19+0x126/0x1a0 > [ 141.429653] do_syscall_64+0x5b/0x80 > [ 141.429847] ? exit_to_user_mode_prepare+0x14d/0x1c0 > [ 141.430109] ? syscall_exit_to_user_mode+0x12/0x30 > [ 141.430849] ? do_syscall_64+0x67/0x80 > [ 141.431083] ? syscall_exit_to_user_mode_prepare+0x183/0x1b0 > [ 141.431770] ? syscall_exit_to_user_mode+0x12/0x30 > [ 141.432482] ? do_syscall_64+0x67/0x80 > [ 141.432714] ? exc_page_fault+0x64/0x140 > [ 141.432911] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Since the igb_disable_sriov() will call pci_disable_sriov() before > releasing any resources, the netdev core will synchronize the cleanup to > avoid any races. This patch removes the useless rtnl_(un)lock to guarantee > correctness. > > CC: stable@vger.kernel.org > Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race") > Reported-by: Corinna Vinschen <vinschen@redhat.com> > Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/ > Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock 2023-03-07 15:29 ` [Intel-wired-lan] [PATCH v2] " Lin Ma ` (2 preceding siblings ...) 2023-03-08 12:04 ` Simon Horman @ 2023-03-16 9:10 ` Romanowski, Rafal 3 siblings, 0 replies; 16+ messages in thread From: Romanowski, Rafal @ 2023-03-16 9:10 UTC (permalink / raw) To: intel-wired-lan > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of > Lin Ma > Sent: wtorek, 7 marca 2023 16:29 > To: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; davem@davemloft.net; > edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > richardcochran@gmail.com; ast@kernel.org; daniel@iogearbox.net; > hawk@kernel.org; john.fastabend@gmail.com > Cc: pmenzel@molgen.mpg.de; Vinschen, Corinna <vinschen@redhat.com>; > regressions@lists.linux.dev; stable@vger.kernel.org; intel-wired- > lan@lists.osuosl.org; Lin Ma <linma@zju.edu.cn> > Subject: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes > deadlock > > The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds rtnl_lock > to eliminate a false data race shown below > > (FREE from device detaching) | (USE from netdev core) > igb_remove | igb_ndo_get_vf_config > igb_disable_sriov | vf >= adapter->vfs_allocated_count? > kfree(adapter->vf_data) | > adapter->vfs_allocated_count = 0 | > | memcpy(... adapter->vf_data[vf] > > The above race will never happen and the extra rtnl_lock causes deadlock > below > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index 03bc1e8af575..5532361b0e94 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev) > igb_release_hw_control(adapter); > > #ifdef CONFIG_PCI_IOV > - rtnl_lock(); > igb_disable_sriov(pdev); > - rtnl_unlock(); > #endif > > unregister_netdev(netdev); > -- Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-03-16 9:11 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-07 9:54 [Intel-wired-lan] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") Corinna Vinschen 2023-03-07 10:12 ` [Intel-wired-lan] [REGRESSION] " Paul Menzel 2023-03-07 10:19 ` Paul Menzel 2023-03-07 10:36 ` Lin Ma 2023-03-07 12:11 ` Corinna Vinschen 2023-03-07 12:37 ` Lin Ma 2023-03-07 12:55 ` Corinna Vinschen 2023-03-07 11:48 ` Lin Ma 2023-03-07 13:05 ` [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma 2023-03-07 13:17 ` Linux regression tracking (Thorsten Leemhuis) 2023-03-07 13:45 ` Corinna Vinschen 2023-03-07 15:29 ` [Intel-wired-lan] [PATCH v2] " Lin Ma 2023-03-07 16:27 ` Corinna Vinschen 2023-03-07 23:22 ` Jacob Keller 2023-03-08 12:04 ` Simon Horman 2023-03-16 9:10 ` Romanowski, Rafal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).