regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
       [not found] <ZAcJvkEPqWeJHO2r@calimero.vinschen.de>
@ 2023-03-07 10:12 ` Paul Menzel
  2023-03-07 10:19   ` Paul Menzel
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Menzel @ 2023-03-07 10:12 UTC (permalink / raw)
  To: Lin Ma; +Cc: intel-wired-lan, regressions

[Adding regressions@lists.linux.dev]

#regzbot ^introduced: 6faee3d4ee8b

Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
> Hi,
> 
> 
> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
> a deadlock scenario when trying to unload the igb module.
> 
> The reproducer is pretty simple:
> 
>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>    # modprobe -r igb
> 
> The hang is quite thorough, I assume it's suffering a deadlock between
> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
> 
> Any chance you could have a look?
> 
> 
> Thanks,
> Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 10:12 ` [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") Paul Menzel
@ 2023-03-07 10:19   ` Paul Menzel
  2023-03-07 10:36     ` Lin Ma
  2023-03-07 11:48     ` Lin Ma
  0 siblings, 2 replies; 13+ messages in thread
From: Paul Menzel @ 2023-03-07 10:19 UTC (permalink / raw)
  To: Lin Ma, Corinna Vinschen; +Cc: intel-wired-lan, regressions

[Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
Reply-To header was set for some reason in the original report.]

Am 07.03.23 um 11:12 schrieb Paul Menzel:
> [Adding regressions@lists.linux.dev]
> 
> #regzbot ^introduced: 6faee3d4ee8b
> 
> Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
>> Hi,
>>
>>
>> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
>> a deadlock scenario when trying to unload the igb module.
>>
>> The reproducer is pretty simple:
>>
>>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>>    # modprobe -r igb
>>
>> The hang is quite thorough, I assume it's suffering a deadlock between
>> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
>>
>> Any chance you could have a look?
>>
>>
>> Thanks,
>> Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 10:19   ` Paul Menzel
@ 2023-03-07 10:36     ` Lin Ma
  2023-03-07 12:11       ` Corinna Vinschen
  2023-03-07 11:48     ` Lin Ma
  1 sibling, 1 reply; 13+ messages in thread
From: Lin Ma @ 2023-03-07 10:36 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Corinna Vinschen, intel-wired-lan, regressions

Hello there

Yeah I am looking at it. Could you please offer the crash log or locking debug message if accessible? Thanks in advance.

Regards
Lin

>[Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
>Reply-To header was set for some reason in the original report.]
>
>Am 07.03.23 um 11:12 schrieb Paul Menzel:
>> [Adding regressions@lists.linux.dev]
>> 
>> #regzbot ^introduced: 6faee3d4ee8b
>> 
>> Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
>>> Hi,
>>>
>>>
>>> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
>>> a deadlock scenario when trying to unload the igb module.
>>>
>>> The reproducer is pretty simple:
>>>
>>>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>>>    # modprobe -r igb
>>>
>>> The hang is quite thorough, I assume it's suffering a deadlock between
>>> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
>>>
>>> Any chance you could have a look?
>>>
>>>
>>> Thanks,
>>> Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 10:19   ` Paul Menzel
  2023-03-07 10:36     ` Lin Ma
@ 2023-03-07 11:48     ` Lin Ma
  2023-03-07 13:05       ` [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma
  1 sibling, 1 reply; 13+ messages in thread
From: Lin Ma @ 2023-03-07 11:48 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Corinna Vinschen, intel-wired-lan, regressions

Hello Paul and Corinna,

Since I didn't have the exact ens5f2 driver as you guys, I just test the module loading and unloading in my QEMU environment.
I turned on the CONFIG_PROVE_LOCKING option but failed to reproduce the deadlock.

root@syzkaller:/# insmod igb.ko
[  116.192730] igb: Intel(R) Gigabit Ethernet Network Driver
[  116.193128] igb: Copyright (c) 2007-2014 Intel Corporation.
root@syzkaller:/# lsmod
Module                  Size  Used by
igb                   225280  0
root@syzkaller:/# rmmod igb
rmmod: ERROR: ../libkmod/libkmod.c:514 lookup_builtin_file() could not open builtin file '/lib/modules/6.1.0/modules.builtin.bin'
root@syzkaller:/# lsmod
lsmod
Module                  Size  Used by

I guess the command "echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs" must do something interesting but I
didn't find the relevant code there. Could you guys take a look?

Regards
Lin



> From: "Paul Menzel" <pmenzel@molgen.mpg.de>
> Sent Time: 2023-03-07 18:19:18 (Tuesday)
> To: "Lin Ma" <linma@zju.edu.cn>, "Corinna Vinschen" <vinschen@redhat.com>
> Cc: intel-wired-lan@lists.osuosl.org, regressions@lists.linux.dev
> Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> 
> [Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
> Reply-To header was set for some reason in the original report.]
> 
> Am 07.03.23 um 11:12 schrieb Paul Menzel:
> > [Adding regressions@lists.linux.dev]
> > 
> > #regzbot ^introduced: 6faee3d4ee8b
> > 
> > Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
> >> Hi,
> >>
> >>
> >> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
> >> a deadlock scenario when trying to unload the igb module.
> >>
> >> The reproducer is pretty simple:
> >>
> >>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
> >>    # modprobe -r igb
> >>
> >> The hang is quite thorough, I assume it's suffering a deadlock between
> >> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
> >>
> >> Any chance you could have a look?
> >>
> >>
> >> Thanks,
> >> Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 10:36     ` Lin Ma
@ 2023-03-07 12:11       ` Corinna Vinschen
  2023-03-07 12:37         ` Lin Ma
  0 siblings, 1 reply; 13+ messages in thread
From: Corinna Vinschen @ 2023-03-07 12:11 UTC (permalink / raw)
  To: Lin Ma; +Cc: Paul Menzel, intel-wired-lan, regressions

Hi Lin,

On Mar  7 18:36, Lin Ma wrote:
> Hello there
> 
> Yeah I am looking at it. Could you please offer the crash log or
> locking debug message if accessible? Thanks in advance.

The commands used to reroduce and the resulting log message:

  # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
  # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
  # modprobe -r igb
  [hang]

console log:

[  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
[  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
[  116.917013] igb 0000:84:00.2: 2 VFs allocated
[  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
[  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
[  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
[  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
[  116.985058] igbvf 0000:84:10.2: PF still resetting
[  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
[  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
[  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
[  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
[  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
[  117.036215] igbvf 0000:84:10.6: PF still resetting
[  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
[  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
[  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
[  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
[  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
[  127.361975] igb 0000:84:00.3: DCA disabled
[  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
[  127.483786] igb 0000:84:00.2: DCA disabled
[  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
[  141.418856]       Not tainted 6.2.0-rc8+ #12
[  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
[  141.420039] Call Trace:
[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  141.433175] RIP: 0033:0x7ff04cc3a05b
[  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
[  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
[  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
[  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
[  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
[  141.438442]  </TASK>


Corinna


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 12:11       ` Corinna Vinschen
@ 2023-03-07 12:37         ` Lin Ma
  2023-03-07 12:55           ` Corinna Vinschen
  0 siblings, 1 reply; 13+ messages in thread
From: Lin Ma @ 2023-03-07 12:37 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Paul Menzel, regressions

Hi Corinna,

Thanks for the crash log. Seems the reason I didn't successfully reproduce the bug is that I didn't actually enable the sriov. 
According to the log, the deadlock seems obvious. I'm soooo noob to make such a mistake.

> [  141.423324]  unregister_netdev+0xe/0x20  // <===== again grant rtnl_lock
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb] // <===== first grant rtnl_lock

I will prepare the commit to revert this buggy commit and I will add the Report-by tag for you.

Regards
Lin


> From: "Corinna Vinschen" <vinschen@redhat.com>
> Sent Time: 2023-03-07 20:11:46 (Tuesday)
> To: "Lin Ma" <linma@zju.edu.cn>
> Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, regressions <regressions@lists.linux.dev>
> Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> 
> Hi Lin,
> 
> On Mar  7 18:36, Lin Ma wrote:
> > Hello there
> > 
> > Yeah I am looking at it. Could you please offer the crash log or
> > locking debug message if accessible? Thanks in advance.
> 
> The commands used to reroduce and the resulting log message:
> 
>   # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
>   # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>   # modprobe -r igb
>   [hang]
> 
> console log:
> 
> [  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
> [  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
> [  116.917013] igb 0000:84:00.2: 2 VFs allocated
> [  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
> [  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
> [  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
> [  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
> [  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
> [  116.985058] igbvf 0000:84:10.2: PF still resetting
> [  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
> [  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
> [  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
> [  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
> [  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
> [  117.036215] igbvf 0000:84:10.6: PF still resetting
> [  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
> [  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
> [  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
> [  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
> [  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
> [  127.361975] igb 0000:84:00.3: DCA disabled
> [  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
> [  127.483786] igb 0000:84:00.2: DCA disabled
> [  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
> [  141.418856]       Not tainted 6.2.0-rc8+ #12
> [  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
> [  141.420039] Call Trace:
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  141.433175] RIP: 0033:0x7ff04cc3a05b
> [  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
> [  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
> [  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
> [  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
> [  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
> [  141.438442]  </TASK>
> 
> 
> Corinna

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 12:37         ` Lin Ma
@ 2023-03-07 12:55           ` Corinna Vinschen
  0 siblings, 0 replies; 13+ messages in thread
From: Corinna Vinschen @ 2023-03-07 12:55 UTC (permalink / raw)
  To: Lin Ma; +Cc: intel-wired-lan, Paul Menzel, regressions

Hi Lin,

On Mar  7 20:37, Lin Ma wrote:
> Hi Corinna,
> 
> Thanks for the crash log. Seems the reason I didn't successfully reproduce the bug is that I didn't actually enable the sriov. 
> According to the log, the deadlock seems obvious. I'm soooo noob to make such a mistake.
> 
> > [  141.423324]  unregister_netdev+0xe/0x20  // <===== again grant rtnl_lock
> > [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> > [  141.423791]  pci_device_remove+0x36/0xb0
> > [  141.423990]  device_release_driver_internal+0xc1/0x160
> > [  141.424270]  pci_stop_bus_device+0x6d/0x90
> > [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> > [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> > [  141.425452]  sriov_disable+0x2f/0xf0
> > [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> > [  141.426353]  igb_remove+0xa0/0x130 [igb] // <===== first grant rtnl_lock
> 
> I will prepare the commit to revert this buggy commit and I will add
> the Report-by tag for you.

Great, thanks a lot!


Corinna





> Regards
> Lin
> 
> 
> > From: "Corinna Vinschen" <vinschen@redhat.com>
> > Sent Time: 2023-03-07 20:11:46 (Tuesday)
> > To: "Lin Ma" <linma@zju.edu.cn>
> > Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, regressions <regressions@lists.linux.dev>
> > Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> > 
> > Hi Lin,
> > 
> > On Mar  7 18:36, Lin Ma wrote:
> > > Hello there
> > > 
> > > Yeah I am looking at it. Could you please offer the crash log or
> > > locking debug message if accessible? Thanks in advance.
> > 
> > The commands used to reroduce and the resulting log message:
> > 
> >   # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
> >   # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
> >   # modprobe -r igb
> >   [hang]
> > 
> > console log:
> > 
> > [  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
> > [  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
> > [  116.917013] igb 0000:84:00.2: 2 VFs allocated
> > [  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
> > [  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
> > [  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
> > [  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
> > [  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
> > [  116.985058] igbvf 0000:84:10.2: PF still resetting
> > [  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
> > [  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
> > [  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
> > [  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
> > [  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
> > [  117.036215] igbvf 0000:84:10.6: PF still resetting
> > [  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
> > [  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
> > [  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
> > [  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
> > [  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
> > [  127.361975] igb 0000:84:00.3: DCA disabled
> > [  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
> > [  127.483786] igb 0000:84:00.2: DCA disabled
> > [  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
> > [  141.418856]       Not tainted 6.2.0-rc8+ #12
> > [  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
> > [  141.420039] Call Trace:
> > [  141.420169]  <TASK>
> > [  141.420672]  __schedule+0x2dd/0x840
> > [  141.421427]  schedule+0x50/0xc0
> > [  141.422041]  schedule_preempt_disabled+0x11/0x20
> > [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> > [  141.423324]  unregister_netdev+0xe/0x20
> > [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> > [  141.423791]  pci_device_remove+0x36/0xb0
> > [  141.423990]  device_release_driver_internal+0xc1/0x160
> > [  141.424270]  pci_stop_bus_device+0x6d/0x90
> > [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> > [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> > [  141.425452]  sriov_disable+0x2f/0xf0
> > [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> > [  141.426353]  igb_remove+0xa0/0x130 [igb]
> > [  141.426599]  pci_device_remove+0x36/0xb0
> > [  141.426796]  device_release_driver_internal+0xc1/0x160
> > [  141.427060]  driver_detach+0x44/0x90
> > [  141.427253]  bus_remove_driver+0x55/0xe0
> > [  141.427477]  pci_unregister_driver+0x2a/0xa0
> > [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> > [  141.429126]  ? mntput_no_expire+0x4a/0x240
> > [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> > [  141.429653]  do_syscall_64+0x5b/0x80
> > [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> > [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> > [  141.430849]  ? do_syscall_64+0x67/0x80
> > [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> > [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> > [  141.432482]  ? do_syscall_64+0x67/0x80
> > [  141.432714]  ? exc_page_fault+0x64/0x140
> > [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [  141.433175] RIP: 0033:0x7ff04cc3a05b
> > [  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> > [  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
> > [  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
> > [  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
> > [  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
> > [  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
> > [  141.438442]  </TASK>
> > 
> > 
> > Corinna
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 11:48     ` Lin Ma
@ 2023-03-07 13:05       ` Lin Ma
  2023-03-07 13:17         ` Linux regression tracking (Thorsten Leemhuis)
  2023-03-07 13:45         ` Corinna Vinschen
  0 siblings, 2 replies; 13+ messages in thread
From: Lin Ma @ 2023-03-07 13:05 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend
  Cc: intel-wired-lan, pmenzel, regressions, vinschen, Lin Ma, stable

The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
rtnl_lock to eliminate a false data race shown below

 (FREE from device detaching)      |   (USE from netdev core)
igb_remove                         |  igb_ndo_get_vf_config
 igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
  kfree(adapter->vf_data)          |
  adapter->vfs_allocated_count = 0 |
                                   |    memcpy(... adapter->vf_data[vf]

The above race will never happen and the extra rtnl_lock causes deadlock
below

[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Since the igb_disable_sriov() will call pci_disable_sriov() before
releasing any resources, the netdev core will synchronize the cleanup to
avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
correctness.

CC: stable@vger.kernel.org
Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
Reported-by: Corinna <vinschen@redhat.com>
Link: https://lore.kernel.org/regressions/3ef31c0b-ce40-20d0-7740-5dc0cca278ca@molgen.mpg.de/
Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 03bc1e8af575..5532361b0e94 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev)
 	igb_release_hw_control(adapter);
 
 #ifdef CONFIG_PCI_IOV
-	rtnl_lock();
 	igb_disable_sriov(pdev);
-	rtnl_unlock();
 #endif
 
 	unregister_netdev(netdev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 13:05       ` [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma
@ 2023-03-07 13:17         ` Linux regression tracking (Thorsten Leemhuis)
  2023-03-07 13:45         ` Corinna Vinschen
  1 sibling, 0 replies; 13+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-03-07 13:17 UTC (permalink / raw)
  To: Lin Ma, jesse.brandeburg, anthony.l.nguyen, davem, edumazet,
	kuba, pabeni, netdev, linux-kernel, richardcochran, ast, daniel,
	hawk, john.fastabend
  Cc: intel-wired-lan, pmenzel, regressions, vinschen, stable

On 07.03.23 14:05, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> [...]
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna <vinschen@redhat.com>
> Link: https://lore.kernel.org/regressions/3ef31c0b-ce40-20d0-7740-5dc0cca278ca@molgen.mpg.de/

FWIW, that afaics should be:

Link:
https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/

(that's the parent of the mail above)

> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 13:05       ` [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma
  2023-03-07 13:17         ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-03-07 13:45         ` Corinna Vinschen
  2023-03-07 15:29           ` [PATCH v2] " Lin Ma
  1 sibling, 1 reply; 13+ messages in thread
From: Corinna Vinschen @ 2023-03-07 13:45 UTC (permalink / raw)
  To: Lin Ma
  Cc: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend, intel-wired-lan, pmenzel, regressions, stable

On Mar  7 21:05, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> [...]
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna <vinschen@redhat.com>

Thank you, but "Corinna Vinschen", please.


Thanks,
Corinna


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 13:45         ` Corinna Vinschen
@ 2023-03-07 15:29           ` Lin Ma
  2023-03-07 16:27             ` Corinna Vinschen
  2023-03-08 12:04             ` Simon Horman
  0 siblings, 2 replies; 13+ messages in thread
From: Lin Ma @ 2023-03-07 15:29 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend
  Cc: intel-wired-lan, pmenzel, regressions, vinschen, Lin Ma, stable

The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
rtnl_lock to eliminate a false data race shown below

 (FREE from device detaching)      |   (USE from netdev core)
igb_remove                         |  igb_ndo_get_vf_config
 igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
  kfree(adapter->vf_data)          |
  adapter->vfs_allocated_count = 0 |
                                   |    memcpy(... adapter->vf_data[vf]

The above race will never happen and the extra rtnl_lock causes deadlock
below

[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Since the igb_disable_sriov() will call pci_disable_sriov() before
releasing any resources, the netdev core will synchronize the cleanup to
avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
correctness.

CC: stable@vger.kernel.org
Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
Reported-by: Corinna Vinschen <vinschen@redhat.com>
Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
V1->V2: update Link and correct Corinna's name
 drivers/net/ethernet/intel/igb/igb_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 03bc1e8af575..5532361b0e94 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev)
 	igb_release_hw_control(adapter);
 
 #ifdef CONFIG_PCI_IOV
-	rtnl_lock();
 	igb_disable_sriov(pdev);
-	rtnl_unlock();
 #endif
 
 	unregister_netdev(netdev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 15:29           ` [PATCH v2] " Lin Ma
@ 2023-03-07 16:27             ` Corinna Vinschen
  2023-03-08 12:04             ` Simon Horman
  1 sibling, 0 replies; 13+ messages in thread
From: Corinna Vinschen @ 2023-03-07 16:27 UTC (permalink / raw)
  To: Lin Ma
  Cc: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend, intel-wired-lan, pmenzel, regressions, stable

On Mar  7 23:29, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> 
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Since the igb_disable_sriov() will call pci_disable_sriov() before
> releasing any resources, the netdev core will synchronize the cleanup to
> avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
> correctness.
> 
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna Vinschen <vinschen@redhat.com>
> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Tested-by: Corinna Vinschen <vinschen@redhat.com>


Thanks,
Corinna


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 15:29           ` [PATCH v2] " Lin Ma
  2023-03-07 16:27             ` Corinna Vinschen
@ 2023-03-08 12:04             ` Simon Horman
  1 sibling, 0 replies; 13+ messages in thread
From: Simon Horman @ 2023-03-08 12:04 UTC (permalink / raw)
  To: Lin Ma
  Cc: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend, intel-wired-lan, pmenzel, regressions, vinschen,
	stable

On Tue, Mar 07, 2023 at 11:29:17PM +0800, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> 
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Since the igb_disable_sriov() will call pci_disable_sriov() before
> releasing any resources, the netdev core will synchronize the cleanup to
> avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
> correctness.
> 
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna Vinschen <vinschen@redhat.com>
> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Reviewed-by: Simon Horman <simon.horman@corigine.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-03-08 12:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ZAcJvkEPqWeJHO2r@calimero.vinschen.de>
2023-03-07 10:12 ` [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") Paul Menzel
2023-03-07 10:19   ` Paul Menzel
2023-03-07 10:36     ` Lin Ma
2023-03-07 12:11       ` Corinna Vinschen
2023-03-07 12:37         ` Lin Ma
2023-03-07 12:55           ` Corinna Vinschen
2023-03-07 11:48     ` Lin Ma
2023-03-07 13:05       ` [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma
2023-03-07 13:17         ` Linux regression tracking (Thorsten Leemhuis)
2023-03-07 13:45         ` Corinna Vinschen
2023-03-07 15:29           ` [PATCH v2] " Lin Ma
2023-03-07 16:27             ` Corinna Vinschen
2023-03-08 12:04             ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).