All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
@ 2023-03-07  9:54 Corinna Vinschen
  2023-03-07 10:12   ` Paul Menzel
  0 siblings, 1 reply; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07  9:54 UTC (permalink / raw)
  To: Lin Ma, intel-wired-lan

Hi,


After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
a deadlock scenario when trying to unload the igb module.

The reproducer is pretty simple:

  # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
  # modprobe -r igb

The hang is quite thorough, I assume it's suffering a deadlock between
the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.

Any chance you could have a look?


Thanks,
Corinna

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07  9:54 [Intel-wired-lan] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") Corinna Vinschen
@ 2023-03-07 10:12   ` Paul Menzel
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Menzel @ 2023-03-07 10:12 UTC (permalink / raw)
  To: Lin Ma; +Cc: intel-wired-lan, regressions

[Adding regressions@lists.linux.dev]

#regzbot ^introduced: 6faee3d4ee8b

Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
> Hi,
> 
> 
> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
> a deadlock scenario when trying to unload the igb module.
> 
> The reproducer is pretty simple:
> 
>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>    # modprobe -r igb
> 
> The hang is quite thorough, I assume it's suffering a deadlock between
> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
> 
> Any chance you could have a look?
> 
> 
> Thanks,
> Corinna
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
@ 2023-03-07 10:12   ` Paul Menzel
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Menzel @ 2023-03-07 10:12 UTC (permalink / raw)
  To: Lin Ma; +Cc: intel-wired-lan, regressions

[Adding regressions@lists.linux.dev]

#regzbot ^introduced: 6faee3d4ee8b

Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
> Hi,
> 
> 
> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
> a deadlock scenario when trying to unload the igb module.
> 
> The reproducer is pretty simple:
> 
>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>    # modprobe -r igb
> 
> The hang is quite thorough, I assume it's suffering a deadlock between
> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
> 
> Any chance you could have a look?
> 
> 
> Thanks,
> Corinna

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 10:12   ` Paul Menzel
@ 2023-03-07 10:19     ` Paul Menzel
  -1 siblings, 0 replies; 29+ messages in thread
From: Paul Menzel @ 2023-03-07 10:19 UTC (permalink / raw)
  To: Lin Ma, Corinna Vinschen; +Cc: intel-wired-lan, regressions

[Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
Reply-To header was set for some reason in the original report.]

Am 07.03.23 um 11:12 schrieb Paul Menzel:
> [Adding regressions@lists.linux.dev]
> 
> #regzbot ^introduced: 6faee3d4ee8b
> 
> Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
>> Hi,
>>
>>
>> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
>> a deadlock scenario when trying to unload the igb module.
>>
>> The reproducer is pretty simple:
>>
>>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>>    # modprobe -r igb
>>
>> The hang is quite thorough, I assume it's suffering a deadlock between
>> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
>>
>> Any chance you could have a look?
>>
>>
>> Thanks,
>> Corinna

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
@ 2023-03-07 10:19     ` Paul Menzel
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Menzel @ 2023-03-07 10:19 UTC (permalink / raw)
  To: Lin Ma, Corinna Vinschen; +Cc: intel-wired-lan, regressions

[Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
Reply-To header was set for some reason in the original report.]

Am 07.03.23 um 11:12 schrieb Paul Menzel:
> [Adding regressions@lists.linux.dev]
> 
> #regzbot ^introduced: 6faee3d4ee8b
> 
> Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
>> Hi,
>>
>>
>> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
>> a deadlock scenario when trying to unload the igb module.
>>
>> The reproducer is pretty simple:
>>
>>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>>    # modprobe -r igb
>>
>> The hang is quite thorough, I assume it's suffering a deadlock between
>> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
>>
>> Any chance you could have a look?
>>
>>
>> Thanks,
>> Corinna
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 10:19     ` Paul Menzel
@ 2023-03-07 10:36       ` Lin Ma
  -1 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 10:36 UTC (permalink / raw)
  To: Paul Menzel; +Cc: intel-wired-lan, Corinna Vinschen, regressions

Hello there

Yeah I am looking at it. Could you please offer the crash log or locking debug message if accessible? Thanks in advance.

Regards
Lin

>[Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
>Reply-To header was set for some reason in the original report.]
>
>Am 07.03.23 um 11:12 schrieb Paul Menzel:
>> [Adding regressions@lists.linux.dev]
>> 
>> #regzbot ^introduced: 6faee3d4ee8b
>> 
>> Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
>>> Hi,
>>>
>>>
>>> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
>>> a deadlock scenario when trying to unload the igb module.
>>>
>>> The reproducer is pretty simple:
>>>
>>>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>>>    # modprobe -r igb
>>>
>>> The hang is quite thorough, I assume it's suffering a deadlock between
>>> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
>>>
>>> Any chance you could have a look?
>>>
>>>
>>> Thanks,
>>> Corinna
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
@ 2023-03-07 10:36       ` Lin Ma
  0 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 10:36 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Corinna Vinschen, intel-wired-lan, regressions

Hello there

Yeah I am looking at it. Could you please offer the crash log or locking debug message if accessible? Thanks in advance.

Regards
Lin

>[Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
>Reply-To header was set for some reason in the original report.]
>
>Am 07.03.23 um 11:12 schrieb Paul Menzel:
>> [Adding regressions@lists.linux.dev]
>> 
>> #regzbot ^introduced: 6faee3d4ee8b
>> 
>> Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
>>> Hi,
>>>
>>>
>>> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
>>> a deadlock scenario when trying to unload the igb module.
>>>
>>> The reproducer is pretty simple:
>>>
>>>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>>>    # modprobe -r igb
>>>
>>> The hang is quite thorough, I assume it's suffering a deadlock between
>>> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
>>>
>>> Any chance you could have a look?
>>>
>>>
>>> Thanks,
>>> Corinna

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 10:19     ` Paul Menzel
@ 2023-03-07 11:48       ` Lin Ma
  -1 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 11:48 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Corinna Vinschen, intel-wired-lan, regressions

Hello Paul and Corinna,

Since I didn't have the exact ens5f2 driver as you guys, I just test the module loading and unloading in my QEMU environment.
I turned on the CONFIG_PROVE_LOCKING option but failed to reproduce the deadlock.

root@syzkaller:/# insmod igb.ko
[  116.192730] igb: Intel(R) Gigabit Ethernet Network Driver
[  116.193128] igb: Copyright (c) 2007-2014 Intel Corporation.
root@syzkaller:/# lsmod
Module                  Size  Used by
igb                   225280  0
root@syzkaller:/# rmmod igb
rmmod: ERROR: ../libkmod/libkmod.c:514 lookup_builtin_file() could not open builtin file '/lib/modules/6.1.0/modules.builtin.bin'
root@syzkaller:/# lsmod
lsmod
Module                  Size  Used by

I guess the command "echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs" must do something interesting but I
didn't find the relevant code there. Could you guys take a look?

Regards
Lin



> From: "Paul Menzel" <pmenzel@molgen.mpg.de>
> Sent Time: 2023-03-07 18:19:18 (Tuesday)
> To: "Lin Ma" <linma@zju.edu.cn>, "Corinna Vinschen" <vinschen@redhat.com>
> Cc: intel-wired-lan@lists.osuosl.org, regressions@lists.linux.dev
> Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> 
> [Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
> Reply-To header was set for some reason in the original report.]
> 
> Am 07.03.23 um 11:12 schrieb Paul Menzel:
> > [Adding regressions@lists.linux.dev]
> > 
> > #regzbot ^introduced: 6faee3d4ee8b
> > 
> > Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
> >> Hi,
> >>
> >>
> >> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
> >> a deadlock scenario when trying to unload the igb module.
> >>
> >> The reproducer is pretty simple:
> >>
> >>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
> >>    # modprobe -r igb
> >>
> >> The hang is quite thorough, I assume it's suffering a deadlock between
> >> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
> >>
> >> Any chance you could have a look?
> >>
> >>
> >> Thanks,
> >> Corinna

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
@ 2023-03-07 11:48       ` Lin Ma
  0 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 11:48 UTC (permalink / raw)
  To: Paul Menzel; +Cc: intel-wired-lan, Corinna Vinschen, regressions

Hello Paul and Corinna,

Since I didn't have the exact ens5f2 driver as you guys, I just test the module loading and unloading in my QEMU environment.
I turned on the CONFIG_PROVE_LOCKING option but failed to reproduce the deadlock.

root@syzkaller:/# insmod igb.ko
[  116.192730] igb: Intel(R) Gigabit Ethernet Network Driver
[  116.193128] igb: Copyright (c) 2007-2014 Intel Corporation.
root@syzkaller:/# lsmod
Module                  Size  Used by
igb                   225280  0
root@syzkaller:/# rmmod igb
rmmod: ERROR: ../libkmod/libkmod.c:514 lookup_builtin_file() could not open builtin file '/lib/modules/6.1.0/modules.builtin.bin'
root@syzkaller:/# lsmod
lsmod
Module                  Size  Used by

I guess the command "echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs" must do something interesting but I
didn't find the relevant code there. Could you guys take a look?

Regards
Lin



> From: "Paul Menzel" <pmenzel@molgen.mpg.de>
> Sent Time: 2023-03-07 18:19:18 (Tuesday)
> To: "Lin Ma" <linma@zju.edu.cn>, "Corinna Vinschen" <vinschen@redhat.com>
> Cc: intel-wired-lan@lists.osuosl.org, regressions@lists.linux.dev
> Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> 
> [Add Corinna back to Cc:, who was removed by Mozilla Thunderbird because 
> Reply-To header was set for some reason in the original report.]
> 
> Am 07.03.23 um 11:12 schrieb Paul Menzel:
> > [Adding regressions@lists.linux.dev]
> > 
> > #regzbot ^introduced: 6faee3d4ee8b
> > 
> > Am 07.03.23 um 10:54 schrieb Corinna Vinschen:
> >> Hi,
> >>
> >>
> >> After patch 6faee3d4ee8b ("igb: Add lock to avoid data race"), we see
> >> a deadlock scenario when trying to unload the igb module.
> >>
> >> The reproducer is pretty simple:
> >>
> >>    # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
> >>    # modprobe -r igb
> >>
> >> The hang is quite thorough, I assume it's suffering a deadlock between
> >> the rtnl_lock and the spinlock introduced by 6faee3d4ee8b.
> >>
> >> Any chance you could have a look?
> >>
> >>
> >> Thanks,
> >> Corinna
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 10:36       ` Lin Ma
@ 2023-03-07 12:11         ` Corinna Vinschen
  -1 siblings, 0 replies; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07 12:11 UTC (permalink / raw)
  To: Lin Ma; +Cc: Paul Menzel, intel-wired-lan, regressions

Hi Lin,

On Mar  7 18:36, Lin Ma wrote:
> Hello there
> 
> Yeah I am looking at it. Could you please offer the crash log or
> locking debug message if accessible? Thanks in advance.

The commands used to reroduce and the resulting log message:

  # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
  # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
  # modprobe -r igb
  [hang]

console log:

[  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
[  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
[  116.917013] igb 0000:84:00.2: 2 VFs allocated
[  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
[  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
[  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
[  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
[  116.985058] igbvf 0000:84:10.2: PF still resetting
[  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
[  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
[  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
[  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
[  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
[  117.036215] igbvf 0000:84:10.6: PF still resetting
[  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
[  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
[  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
[  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
[  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
[  127.361975] igb 0000:84:00.3: DCA disabled
[  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
[  127.483786] igb 0000:84:00.2: DCA disabled
[  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
[  141.418856]       Not tainted 6.2.0-rc8+ #12
[  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
[  141.420039] Call Trace:
[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  141.433175] RIP: 0033:0x7ff04cc3a05b
[  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
[  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
[  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
[  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
[  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
[  141.438442]  </TASK>


Corinna

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
@ 2023-03-07 12:11         ` Corinna Vinschen
  0 siblings, 0 replies; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07 12:11 UTC (permalink / raw)
  To: Lin Ma; +Cc: Paul Menzel, intel-wired-lan, regressions

Hi Lin,

On Mar  7 18:36, Lin Ma wrote:
> Hello there
> 
> Yeah I am looking at it. Could you please offer the crash log or
> locking debug message if accessible? Thanks in advance.

The commands used to reroduce and the resulting log message:

  # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
  # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
  # modprobe -r igb
  [hang]

console log:

[  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
[  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
[  116.917013] igb 0000:84:00.2: 2 VFs allocated
[  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
[  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
[  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
[  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
[  116.985058] igbvf 0000:84:10.2: PF still resetting
[  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
[  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
[  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
[  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
[  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
[  117.036215] igbvf 0000:84:10.6: PF still resetting
[  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
[  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
[  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
[  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
[  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
[  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
[  127.361975] igb 0000:84:00.3: DCA disabled
[  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
[  127.483786] igb 0000:84:00.2: DCA disabled
[  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
[  141.418856]       Not tainted 6.2.0-rc8+ #12
[  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
[  141.420039] Call Trace:
[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  141.433175] RIP: 0033:0x7ff04cc3a05b
[  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
[  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
[  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
[  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
[  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
[  141.438442]  </TASK>


Corinna


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 12:11         ` Corinna Vinschen
@ 2023-03-07 12:37           ` Lin Ma
  -1 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 12:37 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Paul Menzel, regressions

Hi Corinna,

Thanks for the crash log. Seems the reason I didn't successfully reproduce the bug is that I didn't actually enable the sriov. 
According to the log, the deadlock seems obvious. I'm soooo noob to make such a mistake.

> [  141.423324]  unregister_netdev+0xe/0x20  // <===== again grant rtnl_lock
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb] // <===== first grant rtnl_lock

I will prepare the commit to revert this buggy commit and I will add the Report-by tag for you.

Regards
Lin


> From: "Corinna Vinschen" <vinschen@redhat.com>
> Sent Time: 2023-03-07 20:11:46 (Tuesday)
> To: "Lin Ma" <linma@zju.edu.cn>
> Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, regressions <regressions@lists.linux.dev>
> Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> 
> Hi Lin,
> 
> On Mar  7 18:36, Lin Ma wrote:
> > Hello there
> > 
> > Yeah I am looking at it. Could you please offer the crash log or
> > locking debug message if accessible? Thanks in advance.
> 
> The commands used to reroduce and the resulting log message:
> 
>   # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
>   # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>   # modprobe -r igb
>   [hang]
> 
> console log:
> 
> [  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
> [  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
> [  116.917013] igb 0000:84:00.2: 2 VFs allocated
> [  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
> [  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
> [  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
> [  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
> [  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
> [  116.985058] igbvf 0000:84:10.2: PF still resetting
> [  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
> [  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
> [  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
> [  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
> [  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
> [  117.036215] igbvf 0000:84:10.6: PF still resetting
> [  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
> [  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
> [  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
> [  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
> [  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
> [  127.361975] igb 0000:84:00.3: DCA disabled
> [  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
> [  127.483786] igb 0000:84:00.2: DCA disabled
> [  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
> [  141.418856]       Not tainted 6.2.0-rc8+ #12
> [  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
> [  141.420039] Call Trace:
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  141.433175] RIP: 0033:0x7ff04cc3a05b
> [  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
> [  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
> [  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
> [  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
> [  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
> [  141.438442]  </TASK>
> 
> 
> Corinna
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
@ 2023-03-07 12:37           ` Lin Ma
  0 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 12:37 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Paul Menzel, regressions

Hi Corinna,

Thanks for the crash log. Seems the reason I didn't successfully reproduce the bug is that I didn't actually enable the sriov. 
According to the log, the deadlock seems obvious. I'm soooo noob to make such a mistake.

> [  141.423324]  unregister_netdev+0xe/0x20  // <===== again grant rtnl_lock
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb] // <===== first grant rtnl_lock

I will prepare the commit to revert this buggy commit and I will add the Report-by tag for you.

Regards
Lin


> From: "Corinna Vinschen" <vinschen@redhat.com>
> Sent Time: 2023-03-07 20:11:46 (Tuesday)
> To: "Lin Ma" <linma@zju.edu.cn>
> Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, regressions <regressions@lists.linux.dev>
> Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> 
> Hi Lin,
> 
> On Mar  7 18:36, Lin Ma wrote:
> > Hello there
> > 
> > Yeah I am looking at it. Could you please offer the crash log or
> > locking debug message if accessible? Thanks in advance.
> 
> The commands used to reroduce and the resulting log message:
> 
>   # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
>   # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
>   # modprobe -r igb
>   [hang]
> 
> console log:
> 
> [  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
> [  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
> [  116.917013] igb 0000:84:00.2: 2 VFs allocated
> [  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
> [  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
> [  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
> [  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
> [  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
> [  116.985058] igbvf 0000:84:10.2: PF still resetting
> [  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
> [  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
> [  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
> [  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
> [  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
> [  117.036215] igbvf 0000:84:10.6: PF still resetting
> [  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
> [  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
> [  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
> [  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
> [  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> [  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
> [  127.361975] igb 0000:84:00.3: DCA disabled
> [  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
> [  127.483786] igb 0000:84:00.2: DCA disabled
> [  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
> [  141.418856]       Not tainted 6.2.0-rc8+ #12
> [  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
> [  141.420039] Call Trace:
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  141.433175] RIP: 0033:0x7ff04cc3a05b
> [  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> [  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
> [  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
> [  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
> [  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
> [  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
> [  141.438442]  </TASK>
> 
> 
> Corinna

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
  2023-03-07 12:37           ` Lin Ma
@ 2023-03-07 12:55             ` Corinna Vinschen
  -1 siblings, 0 replies; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07 12:55 UTC (permalink / raw)
  To: Lin Ma; +Cc: intel-wired-lan, Paul Menzel, regressions

Hi Lin,

On Mar  7 20:37, Lin Ma wrote:
> Hi Corinna,
> 
> Thanks for the crash log. Seems the reason I didn't successfully reproduce the bug is that I didn't actually enable the sriov. 
> According to the log, the deadlock seems obvious. I'm soooo noob to make such a mistake.
> 
> > [  141.423324]  unregister_netdev+0xe/0x20  // <===== again grant rtnl_lock
> > [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> > [  141.423791]  pci_device_remove+0x36/0xb0
> > [  141.423990]  device_release_driver_internal+0xc1/0x160
> > [  141.424270]  pci_stop_bus_device+0x6d/0x90
> > [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> > [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> > [  141.425452]  sriov_disable+0x2f/0xf0
> > [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> > [  141.426353]  igb_remove+0xa0/0x130 [igb] // <===== first grant rtnl_lock
> 
> I will prepare the commit to revert this buggy commit and I will add
> the Report-by tag for you.

Great, thanks a lot!


Corinna





> Regards
> Lin
> 
> 
> > From: "Corinna Vinschen" <vinschen@redhat.com>
> > Sent Time: 2023-03-07 20:11:46 (Tuesday)
> > To: "Lin Ma" <linma@zju.edu.cn>
> > Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, regressions <regressions@lists.linux.dev>
> > Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> > 
> > Hi Lin,
> > 
> > On Mar  7 18:36, Lin Ma wrote:
> > > Hello there
> > > 
> > > Yeah I am looking at it. Could you please offer the crash log or
> > > locking debug message if accessible? Thanks in advance.
> > 
> > The commands used to reroduce and the resulting log message:
> > 
> >   # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
> >   # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
> >   # modprobe -r igb
> >   [hang]
> > 
> > console log:
> > 
> > [  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
> > [  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
> > [  116.917013] igb 0000:84:00.2: 2 VFs allocated
> > [  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
> > [  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
> > [  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
> > [  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
> > [  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
> > [  116.985058] igbvf 0000:84:10.2: PF still resetting
> > [  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
> > [  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
> > [  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
> > [  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
> > [  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
> > [  117.036215] igbvf 0000:84:10.6: PF still resetting
> > [  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
> > [  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
> > [  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
> > [  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
> > [  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
> > [  127.361975] igb 0000:84:00.3: DCA disabled
> > [  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
> > [  127.483786] igb 0000:84:00.2: DCA disabled
> > [  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
> > [  141.418856]       Not tainted 6.2.0-rc8+ #12
> > [  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
> > [  141.420039] Call Trace:
> > [  141.420169]  <TASK>
> > [  141.420672]  __schedule+0x2dd/0x840
> > [  141.421427]  schedule+0x50/0xc0
> > [  141.422041]  schedule_preempt_disabled+0x11/0x20
> > [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> > [  141.423324]  unregister_netdev+0xe/0x20
> > [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> > [  141.423791]  pci_device_remove+0x36/0xb0
> > [  141.423990]  device_release_driver_internal+0xc1/0x160
> > [  141.424270]  pci_stop_bus_device+0x6d/0x90
> > [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> > [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> > [  141.425452]  sriov_disable+0x2f/0xf0
> > [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> > [  141.426353]  igb_remove+0xa0/0x130 [igb]
> > [  141.426599]  pci_device_remove+0x36/0xb0
> > [  141.426796]  device_release_driver_internal+0xc1/0x160
> > [  141.427060]  driver_detach+0x44/0x90
> > [  141.427253]  bus_remove_driver+0x55/0xe0
> > [  141.427477]  pci_unregister_driver+0x2a/0xa0
> > [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> > [  141.429126]  ? mntput_no_expire+0x4a/0x240
> > [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> > [  141.429653]  do_syscall_64+0x5b/0x80
> > [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> > [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> > [  141.430849]  ? do_syscall_64+0x67/0x80
> > [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> > [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> > [  141.432482]  ? do_syscall_64+0x67/0x80
> > [  141.432714]  ? exc_page_fault+0x64/0x140
> > [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [  141.433175] RIP: 0033:0x7ff04cc3a05b
> > [  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> > [  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
> > [  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
> > [  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
> > [  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
> > [  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
> > [  141.438442]  </TASK>
> > 
> > 
> > Corinna
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
@ 2023-03-07 12:55             ` Corinna Vinschen
  0 siblings, 0 replies; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07 12:55 UTC (permalink / raw)
  To: Lin Ma; +Cc: Paul Menzel, intel-wired-lan, regressions

Hi Lin,

On Mar  7 20:37, Lin Ma wrote:
> Hi Corinna,
> 
> Thanks for the crash log. Seems the reason I didn't successfully reproduce the bug is that I didn't actually enable the sriov. 
> According to the log, the deadlock seems obvious. I'm soooo noob to make such a mistake.
> 
> > [  141.423324]  unregister_netdev+0xe/0x20  // <===== again grant rtnl_lock
> > [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> > [  141.423791]  pci_device_remove+0x36/0xb0
> > [  141.423990]  device_release_driver_internal+0xc1/0x160
> > [  141.424270]  pci_stop_bus_device+0x6d/0x90
> > [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> > [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> > [  141.425452]  sriov_disable+0x2f/0xf0
> > [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> > [  141.426353]  igb_remove+0xa0/0x130 [igb] // <===== first grant rtnl_lock
> 
> I will prepare the commit to revert this buggy commit and I will add
> the Report-by tag for you.

Great, thanks a lot!


Corinna





> Regards
> Lin
> 
> 
> > From: "Corinna Vinschen" <vinschen@redhat.com>
> > Sent Time: 2023-03-07 20:11:46 (Tuesday)
> > To: "Lin Ma" <linma@zju.edu.cn>
> > Cc: "Paul Menzel" <pmenzel@molgen.mpg.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, regressions <regressions@lists.linux.dev>
> > Subject: Re: [Intel-wired-lan] [REGRESSION] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race")
> > 
> > Hi Lin,
> > 
> > On Mar  7 18:36, Lin Ma wrote:
> > > Hello there
> > > 
> > > Yeah I am looking at it. Could you please offer the crash log or
> > > locking debug message if accessible? Thanks in advance.
> > 
> > The commands used to reroduce and the resulting log message:
> > 
> >   # echo 10 > /proc/sys/kernel/hung_task_timeout_secs
> >   # echo 2 > /sys/class/net/ens5f2/device/sriov_numvfs
> >   # modprobe -r igb
> >   [hang]
> > 
> > console log:
> > 
> > [  116.914656] pci 0000:84:10.2: [8086:1520] type 00 class 0x020000
> > [  116.915722] pci 0000:84:10.6: [8086:1520] type 00 class 0x020000
> > [  116.917013] igb 0000:84:00.2: 2 VFs allocated
> > [  116.978350] igbvf: Intel(R) Gigabit Virtual Function Network Driver
> > [  116.979072] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
> > [  116.980253] igbvf 0000:84:10.2: enabling device (0000 -> 0002)
> > [  116.982356] igbvf 0000:84:10.2: PF still in reset state. Is the PF interface up?
> > [  116.983196] igbvf 0000:84:10.2: Assigning random MAC address.
> > [  116.985058] igbvf 0000:84:10.2: PF still resetting
> > [  117.011054] igbvf 0000:84:10.2: Intel(R) I350 Virtual Function
> > [  117.011785] igbvf 0000:84:10.2: Address: c2:e5:c2:a2:75:00
> > [  117.012189] igbvf 0000:84:10.6: enabling device (0000 -> 0002)
> > [  117.023911] igbvf 0000:84:10.6: PF still in reset state. Is the PF interface up?
> > [  117.024724] igbvf 0000:84:10.6: Assigning random MAC address.
> > [  117.036215] igbvf 0000:84:10.6: PF still resetting
> > [  117.037596] igbvf 0000:84:10.6: Intel(R) I350 Virtual Function
> > [  117.038327] igbvf 0000:84:10.6: Address: ea:74:07:f4:28:7c
> > [  117.047970] igbvf 0000:84:10.6 ens5f2v1: renamed from eth1
> > [  117.062847] igbvf 0000:84:10.2 ens5f2v0: renamed from eth0
> > [  117.080725] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  117.106106] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  117.107189] igb 0000:84:00.2: VF 1 attempted to set invalid MAC filter
> > [  127.361316] igb 0000:84:00.3: removed PHC on ens5f3
> > [  127.361975] igb 0000:84:00.3: DCA disabled
> > [  127.483085] igb 0000:84:00.2: removed PHC on ens5f2
> > [  127.483786] igb 0000:84:00.2: DCA disabled
> > [  141.418410] INFO: task modprobe:2078 blocked for more than 10 seconds.
> > [  141.418856]       Not tainted 6.2.0-rc8+ #12
> > [  141.419184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  141.419616] task:modprobe        state:D stack:0     pid:2078  ppid:2037   flags:0x00004000
> > [  141.420039] Call Trace:
> > [  141.420169]  <TASK>
> > [  141.420672]  __schedule+0x2dd/0x840
> > [  141.421427]  schedule+0x50/0xc0
> > [  141.422041]  schedule_preempt_disabled+0x11/0x20
> > [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> > [  141.423324]  unregister_netdev+0xe/0x20
> > [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> > [  141.423791]  pci_device_remove+0x36/0xb0
> > [  141.423990]  device_release_driver_internal+0xc1/0x160
> > [  141.424270]  pci_stop_bus_device+0x6d/0x90
> > [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> > [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> > [  141.425452]  sriov_disable+0x2f/0xf0
> > [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> > [  141.426353]  igb_remove+0xa0/0x130 [igb]
> > [  141.426599]  pci_device_remove+0x36/0xb0
> > [  141.426796]  device_release_driver_internal+0xc1/0x160
> > [  141.427060]  driver_detach+0x44/0x90
> > [  141.427253]  bus_remove_driver+0x55/0xe0
> > [  141.427477]  pci_unregister_driver+0x2a/0xa0
> > [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> > [  141.429126]  ? mntput_no_expire+0x4a/0x240
> > [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> > [  141.429653]  do_syscall_64+0x5b/0x80
> > [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> > [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> > [  141.430849]  ? do_syscall_64+0x67/0x80
> > [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> > [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> > [  141.432482]  ? do_syscall_64+0x67/0x80
> > [  141.432714]  ? exc_page_fault+0x64/0x140
> > [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > [  141.433175] RIP: 0033:0x7ff04cc3a05b
> > [  141.433375] RSP: 002b:00007ffd891bcb38 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> > [  141.434337] RAX: ffffffffffffffda RBX: 000055fde256dd00 RCX: 00007ff04cc3a05b
> > [  141.435171] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055fde256dd68
> > [  141.435973] RBP: 000055fde256dd68 R08: 00007ffd891bbae1 R09: 0000000000000000
> > [  141.436852] R10: 00007ff04cd71480 R11: 0000000000000206 R12: 0000000000000000
> > [  141.437636] R13: 0000000000000000 R14: 000055fde256dd68 R15: 0000000000000000
> > [  141.438442]  </TASK>
> > 
> > 
> > Corinna
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 11:48       ` Lin Ma
@ 2023-03-07 13:05         ` Lin Ma
  -1 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 13:05 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend
  Cc: pmenzel, vinschen, regressions, stable, intel-wired-lan, Lin Ma

The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
rtnl_lock to eliminate a false data race shown below

 (FREE from device detaching)      |   (USE from netdev core)
igb_remove                         |  igb_ndo_get_vf_config
 igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
  kfree(adapter->vf_data)          |
  adapter->vfs_allocated_count = 0 |
                                   |    memcpy(... adapter->vf_data[vf]

The above race will never happen and the extra rtnl_lock causes deadlock
below

[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Since the igb_disable_sriov() will call pci_disable_sriov() before
releasing any resources, the netdev core will synchronize the cleanup to
avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
correctness.

CC: stable@vger.kernel.org
Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
Reported-by: Corinna <vinschen@redhat.com>
Link: https://lore.kernel.org/regressions/3ef31c0b-ce40-20d0-7740-5dc0cca278ca@molgen.mpg.de/
Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 03bc1e8af575..5532361b0e94 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev)
 	igb_release_hw_control(adapter);
 
 #ifdef CONFIG_PCI_IOV
-	rtnl_lock();
 	igb_disable_sriov(pdev);
-	rtnl_unlock();
 #endif
 
 	unregister_netdev(netdev);
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH] igb: revert rtnl_lock() that causes deadlock
@ 2023-03-07 13:05         ` Lin Ma
  0 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 13:05 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend
  Cc: intel-wired-lan, pmenzel, regressions, vinschen, Lin Ma, stable

The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
rtnl_lock to eliminate a false data race shown below

 (FREE from device detaching)      |   (USE from netdev core)
igb_remove                         |  igb_ndo_get_vf_config
 igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
  kfree(adapter->vf_data)          |
  adapter->vfs_allocated_count = 0 |
                                   |    memcpy(... adapter->vf_data[vf]

The above race will never happen and the extra rtnl_lock causes deadlock
below

[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Since the igb_disable_sriov() will call pci_disable_sriov() before
releasing any resources, the netdev core will synchronize the cleanup to
avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
correctness.

CC: stable@vger.kernel.org
Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
Reported-by: Corinna <vinschen@redhat.com>
Link: https://lore.kernel.org/regressions/3ef31c0b-ce40-20d0-7740-5dc0cca278ca@molgen.mpg.de/
Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 03bc1e8af575..5532361b0e94 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev)
 	igb_release_hw_control(adapter);
 
 #ifdef CONFIG_PCI_IOV
-	rtnl_lock();
 	igb_disable_sriov(pdev);
-	rtnl_unlock();
 #endif
 
 	unregister_netdev(netdev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 13:05         ` Lin Ma
@ 2023-03-07 13:17           ` Linux regression tracking (Thorsten Leemhuis)
  -1 siblings, 0 replies; 29+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-03-07 13:17 UTC (permalink / raw)
  To: Lin Ma, jesse.brandeburg, anthony.l.nguyen, davem, edumazet,
	kuba, pabeni, netdev, linux-kernel, richardcochran, ast, daniel,
	hawk, john.fastabend
  Cc: pmenzel, vinschen, intel-wired-lan, regressions, stable

On 07.03.23 14:05, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> [...]
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna <vinschen@redhat.com>
> Link: https://lore.kernel.org/regressions/3ef31c0b-ce40-20d0-7740-5dc0cca278ca@molgen.mpg.de/

FWIW, that afaics should be:

Link:
https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/

(that's the parent of the mail above)

> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Ciao, Thorsten
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] igb: revert rtnl_lock() that causes deadlock
@ 2023-03-07 13:17           ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 0 replies; 29+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-03-07 13:17 UTC (permalink / raw)
  To: Lin Ma, jesse.brandeburg, anthony.l.nguyen, davem, edumazet,
	kuba, pabeni, netdev, linux-kernel, richardcochran, ast, daniel,
	hawk, john.fastabend
  Cc: intel-wired-lan, pmenzel, regressions, vinschen, stable

On 07.03.23 14:05, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> [...]
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna <vinschen@redhat.com>
> Link: https://lore.kernel.org/regressions/3ef31c0b-ce40-20d0-7740-5dc0cca278ca@molgen.mpg.de/

FWIW, that afaics should be:

Link:
https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/

(that's the parent of the mail above)

> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 13:05         ` Lin Ma
@ 2023-03-07 13:45           ` Corinna Vinschen
  -1 siblings, 0 replies; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07 13:45 UTC (permalink / raw)
  To: Lin Ma
  Cc: pmenzel, ast, hawk, daniel, netdev, richardcochran,
	john.fastabend, jesse.brandeburg, stable, linux-kernel, edumazet,
	anthony.l.nguyen, intel-wired-lan, kuba, pabeni, davem,
	regressions

On Mar  7 21:05, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> [...]
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna <vinschen@redhat.com>

Thank you, but "Corinna Vinschen", please.


Thanks,
Corinna

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] igb: revert rtnl_lock() that causes deadlock
@ 2023-03-07 13:45           ` Corinna Vinschen
  0 siblings, 0 replies; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07 13:45 UTC (permalink / raw)
  To: Lin Ma
  Cc: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend, intel-wired-lan, pmenzel, regressions, stable

On Mar  7 21:05, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> [...]
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna <vinschen@redhat.com>

Thank you, but "Corinna Vinschen", please.


Thanks,
Corinna


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 13:45           ` Corinna Vinschen
@ 2023-03-07 15:29             ` Lin Ma
  -1 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 15:29 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend
  Cc: intel-wired-lan, pmenzel, regressions, vinschen, Lin Ma, stable

The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
rtnl_lock to eliminate a false data race shown below

 (FREE from device detaching)      |   (USE from netdev core)
igb_remove                         |  igb_ndo_get_vf_config
 igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
  kfree(adapter->vf_data)          |
  adapter->vfs_allocated_count = 0 |
                                   |    memcpy(... adapter->vf_data[vf]

The above race will never happen and the extra rtnl_lock causes deadlock
below

[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Since the igb_disable_sriov() will call pci_disable_sriov() before
releasing any resources, the netdev core will synchronize the cleanup to
avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
correctness.

CC: stable@vger.kernel.org
Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
Reported-by: Corinna Vinschen <vinschen@redhat.com>
Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
V1->V2: update Link and correct Corinna's name
 drivers/net/ethernet/intel/igb/igb_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 03bc1e8af575..5532361b0e94 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev)
 	igb_release_hw_control(adapter);
 
 #ifdef CONFIG_PCI_IOV
-	rtnl_lock();
 	igb_disable_sriov(pdev);
-	rtnl_unlock();
 #endif
 
 	unregister_netdev(netdev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock
@ 2023-03-07 15:29             ` Lin Ma
  0 siblings, 0 replies; 29+ messages in thread
From: Lin Ma @ 2023-03-07 15:29 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend
  Cc: pmenzel, vinschen, regressions, stable, intel-wired-lan, Lin Ma

The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
rtnl_lock to eliminate a false data race shown below

 (FREE from device detaching)      |   (USE from netdev core)
igb_remove                         |  igb_ndo_get_vf_config
 igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
  kfree(adapter->vf_data)          |
  adapter->vfs_allocated_count = 0 |
                                   |    memcpy(... adapter->vf_data[vf]

The above race will never happen and the extra rtnl_lock causes deadlock
below

[  141.420169]  <TASK>
[  141.420672]  __schedule+0x2dd/0x840
[  141.421427]  schedule+0x50/0xc0
[  141.422041]  schedule_preempt_disabled+0x11/0x20
[  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
[  141.423324]  unregister_netdev+0xe/0x20
[  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
[  141.423791]  pci_device_remove+0x36/0xb0
[  141.423990]  device_release_driver_internal+0xc1/0x160
[  141.424270]  pci_stop_bus_device+0x6d/0x90
[  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
[  141.424789]  pci_iov_remove_virtfn+0xba/0x120
[  141.425452]  sriov_disable+0x2f/0xf0
[  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
[  141.426353]  igb_remove+0xa0/0x130 [igb]
[  141.426599]  pci_device_remove+0x36/0xb0
[  141.426796]  device_release_driver_internal+0xc1/0x160
[  141.427060]  driver_detach+0x44/0x90
[  141.427253]  bus_remove_driver+0x55/0xe0
[  141.427477]  pci_unregister_driver+0x2a/0xa0
[  141.428296]  __x64_sys_delete_module+0x141/0x2b0
[  141.429126]  ? mntput_no_expire+0x4a/0x240
[  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
[  141.429653]  do_syscall_64+0x5b/0x80
[  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
[  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.430849]  ? do_syscall_64+0x67/0x80
[  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
[  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
[  141.432482]  ? do_syscall_64+0x67/0x80
[  141.432714]  ? exc_page_fault+0x64/0x140
[  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Since the igb_disable_sriov() will call pci_disable_sriov() before
releasing any resources, the netdev core will synchronize the cleanup to
avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
correctness.

CC: stable@vger.kernel.org
Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
Reported-by: Corinna Vinschen <vinschen@redhat.com>
Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
V1->V2: update Link and correct Corinna's name
 drivers/net/ethernet/intel/igb/igb_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 03bc1e8af575..5532361b0e94 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev)
 	igb_release_hw_control(adapter);
 
 #ifdef CONFIG_PCI_IOV
-	rtnl_lock();
 	igb_disable_sriov(pdev);
-	rtnl_unlock();
 #endif
 
 	unregister_netdev(netdev);
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 15:29             ` [Intel-wired-lan] " Lin Ma
@ 2023-03-07 16:27               ` Corinna Vinschen
  -1 siblings, 0 replies; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07 16:27 UTC (permalink / raw)
  To: Lin Ma
  Cc: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend, intel-wired-lan, pmenzel, regressions, stable

On Mar  7 23:29, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> 
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Since the igb_disable_sriov() will call pci_disable_sriov() before
> releasing any resources, the netdev core will synchronize the cleanup to
> avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
> correctness.
> 
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna Vinschen <vinschen@redhat.com>
> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Tested-by: Corinna Vinschen <vinschen@redhat.com>


Thanks,
Corinna


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock
@ 2023-03-07 16:27               ` Corinna Vinschen
  0 siblings, 0 replies; 29+ messages in thread
From: Corinna Vinschen @ 2023-03-07 16:27 UTC (permalink / raw)
  To: Lin Ma
  Cc: pmenzel, ast, hawk, daniel, netdev, richardcochran,
	john.fastabend, jesse.brandeburg, stable, linux-kernel, edumazet,
	anthony.l.nguyen, intel-wired-lan, kuba, pabeni, davem,
	regressions

On Mar  7 23:29, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> 
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Since the igb_disable_sriov() will call pci_disable_sriov() before
> releasing any resources, the netdev core will synchronize the cleanup to
> avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
> correctness.
> 
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna Vinschen <vinschen@redhat.com>
> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Tested-by: Corinna Vinschen <vinschen@redhat.com>


Thanks,
Corinna

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 15:29             ` [Intel-wired-lan] " Lin Ma
  (?)
  (?)
@ 2023-03-07 23:22             ` Jacob Keller
  -1 siblings, 0 replies; 29+ messages in thread
From: Jacob Keller @ 2023-03-07 23:22 UTC (permalink / raw)
  To: intel-wired-lan



On 3/7/2023 7:29 AM, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 

This patch does a number of other changes as well, including its own
spinlock, so this is only a partial revert. Ok.

>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 

How was this data race detected? KCSAN?

> The above race will never happen and the extra rtnl_lock causes deadlock
> below

So this was a theoretical data race but in practice hasn't occurred?

> 
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > Since the igb_disable_sriov() will call pci_disable_sriov() before
> releasing any resources, the netdev core will synchronize the cleanup to
> avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
> correctness.
> 

you could say "harmful" rather than "useless" here.

igb is holding the RTNL lock while calling igb_disable_sriov which then
tries to remove the VF driver which then takes the RTNL lock and deadlocks.

This was possibly not caught before because if the igbvf device was in a
VM, this would not deadlock since the VF driver would not be loaded in
host, and would thus not acquire the host RTNL lock.

Ok.

This makes sense how it might not have been caught before.

The original commit referenced that the RTNL lock was added similar to
719479230893 ("dpaa2-eth: add MAC/PHY support through phylink") so I
wanted to check if that might also be buggy. Turns out no, its actually
seemingly unrelated to this broken use of RTNL lock as its holding
something for disconnecting a phylink object, but has nothing to do with
SR-IOV.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

Thanks,
Jake

> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna Vinschen <vinschen@redhat.com>
> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
> Signed-off-by: Lin Ma <linma@zju.edu.cn>
> ---
> V1->V2: update Link and correct Corinna's name
>  drivers/net/ethernet/intel/igb/igb_main.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
> index 03bc1e8af575..5532361b0e94 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev)
>  	igb_release_hw_control(adapter);
>  
>  #ifdef CONFIG_PCI_IOV
> -	rtnl_lock();
>  	igb_disable_sriov(pdev);
> -	rtnl_unlock();
>  #endif
>  
>  	unregister_netdev(netdev);
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 15:29             ` [Intel-wired-lan] " Lin Ma
@ 2023-03-08 12:04               ` Simon Horman
  -1 siblings, 0 replies; 29+ messages in thread
From: Simon Horman @ 2023-03-08 12:04 UTC (permalink / raw)
  To: Lin Ma
  Cc: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba,
	pabeni, netdev, linux-kernel, richardcochran, ast, daniel, hawk,
	john.fastabend, intel-wired-lan, pmenzel, regressions, vinschen,
	stable

On Tue, Mar 07, 2023 at 11:29:17PM +0800, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> 
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Since the igb_disable_sriov() will call pci_disable_sriov() before
> releasing any resources, the netdev core will synchronize the cleanup to
> avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
> correctness.
> 
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna Vinschen <vinschen@redhat.com>
> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Reviewed-by: Simon Horman <simon.horman@corigine.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock
@ 2023-03-08 12:04               ` Simon Horman
  0 siblings, 0 replies; 29+ messages in thread
From: Simon Horman @ 2023-03-08 12:04 UTC (permalink / raw)
  To: Lin Ma
  Cc: pmenzel, ast, hawk, daniel, netdev, richardcochran,
	john.fastabend, jesse.brandeburg, stable, linux-kernel, vinschen,
	edumazet, anthony.l.nguyen, intel-wired-lan, kuba, pabeni, davem,
	regressions

On Tue, Mar 07, 2023 at 11:29:17PM +0800, Lin Ma wrote:
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds
> rtnl_lock to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> 
> [  141.420169]  <TASK>
> [  141.420672]  __schedule+0x2dd/0x840
> [  141.421427]  schedule+0x50/0xc0
> [  141.422041]  schedule_preempt_disabled+0x11/0x20
> [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
> [  141.423324]  unregister_netdev+0xe/0x20
> [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
> [  141.423791]  pci_device_remove+0x36/0xb0
> [  141.423990]  device_release_driver_internal+0xc1/0x160
> [  141.424270]  pci_stop_bus_device+0x6d/0x90
> [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
> [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
> [  141.425452]  sriov_disable+0x2f/0xf0
> [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
> [  141.426353]  igb_remove+0xa0/0x130 [igb]
> [  141.426599]  pci_device_remove+0x36/0xb0
> [  141.426796]  device_release_driver_internal+0xc1/0x160
> [  141.427060]  driver_detach+0x44/0x90
> [  141.427253]  bus_remove_driver+0x55/0xe0
> [  141.427477]  pci_unregister_driver+0x2a/0xa0
> [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
> [  141.429126]  ? mntput_no_expire+0x4a/0x240
> [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
> [  141.429653]  do_syscall_64+0x5b/0x80
> [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
> [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.430849]  ? do_syscall_64+0x67/0x80
> [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
> [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
> [  141.432482]  ? do_syscall_64+0x67/0x80
> [  141.432714]  ? exc_page_fault+0x64/0x140
> [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Since the igb_disable_sriov() will call pci_disable_sriov() before
> releasing any resources, the netdev core will synchronize the cleanup to
> avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
> correctness.
> 
> CC: stable@vger.kernel.org
> Fixes: 6faee3d4ee8b ("igb: Add lock to avoid data race")
> Reported-by: Corinna Vinschen <vinschen@redhat.com>
> Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/
> Signed-off-by: Lin Ma <linma@zju.edu.cn>

Reviewed-by: Simon Horman <simon.horman@corigine.com>

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes deadlock
  2023-03-07 15:29             ` [Intel-wired-lan] " Lin Ma
                               ` (3 preceding siblings ...)
  (?)
@ 2023-03-16  9:10             ` Romanowski, Rafal
  -1 siblings, 0 replies; 29+ messages in thread
From: Romanowski, Rafal @ 2023-03-16  9:10 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Lin Ma
> Sent: wtorek, 7 marca 2023 16:29
> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> richardcochran@gmail.com; ast@kernel.org; daniel@iogearbox.net;
> hawk@kernel.org; john.fastabend@gmail.com
> Cc: pmenzel@molgen.mpg.de; Vinschen, Corinna <vinschen@redhat.com>;
> regressions@lists.linux.dev; stable@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org; Lin Ma <linma@zju.edu.cn>
> Subject: [Intel-wired-lan] [PATCH v2] igb: revert rtnl_lock() that causes
> deadlock
> 
> The commit 6faee3d4ee8b ("igb: Add lock to avoid data race") adds rtnl_lock
> to eliminate a false data race shown below
> 
>  (FREE from device detaching)      |   (USE from netdev core)
> igb_remove                         |  igb_ndo_get_vf_config
>  igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
>   kfree(adapter->vf_data)          |
>   adapter->vfs_allocated_count = 0 |
>                                    |    memcpy(... adapter->vf_data[vf]
> 
> The above race will never happen and the extra rtnl_lock causes deadlock
> below
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 03bc1e8af575..5532361b0e94 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3863,9 +3863,7 @@ static void igb_remove(struct pci_dev *pdev)
>  	igb_release_hw_control(adapter);
> 
>  #ifdef CONFIG_PCI_IOV
> -	rtnl_lock();
>  	igb_disable_sriov(pdev);
> -	rtnl_unlock();
>  #endif
> 
>  	unregister_netdev(netdev);
> --


Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>



_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-03-16  9:11 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07  9:54 [Intel-wired-lan] Deadlock since commit 6faee3d4ee8b ("igb: Add lock to avoid data race") Corinna Vinschen
2023-03-07 10:12 ` [Intel-wired-lan] [REGRESSION] " Paul Menzel
2023-03-07 10:12   ` Paul Menzel
2023-03-07 10:19   ` Paul Menzel
2023-03-07 10:19     ` Paul Menzel
2023-03-07 10:36     ` Lin Ma
2023-03-07 10:36       ` Lin Ma
2023-03-07 12:11       ` Corinna Vinschen
2023-03-07 12:11         ` Corinna Vinschen
2023-03-07 12:37         ` Lin Ma
2023-03-07 12:37           ` Lin Ma
2023-03-07 12:55           ` Corinna Vinschen
2023-03-07 12:55             ` Corinna Vinschen
2023-03-07 11:48     ` Lin Ma
2023-03-07 11:48       ` Lin Ma
2023-03-07 13:05       ` [Intel-wired-lan] [PATCH] igb: revert rtnl_lock() that causes deadlock Lin Ma
2023-03-07 13:05         ` Lin Ma
2023-03-07 13:17         ` [Intel-wired-lan] " Linux regression tracking (Thorsten Leemhuis)
2023-03-07 13:17           ` Linux regression tracking (Thorsten Leemhuis)
2023-03-07 13:45         ` [Intel-wired-lan] " Corinna Vinschen
2023-03-07 13:45           ` Corinna Vinschen
2023-03-07 15:29           ` [PATCH v2] " Lin Ma
2023-03-07 15:29             ` [Intel-wired-lan] " Lin Ma
2023-03-07 16:27             ` Corinna Vinschen
2023-03-07 16:27               ` [Intel-wired-lan] " Corinna Vinschen
2023-03-07 23:22             ` Jacob Keller
2023-03-08 12:04             ` Simon Horman
2023-03-08 12:04               ` [Intel-wired-lan] " Simon Horman
2023-03-16  9:10             ` Romanowski, Rafal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.