[Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
@ 2022-07-14  8:14 James Hogan
  2022-07-15 17:25 ` Tony Nguyen
  2022-07-17 19:59 ` Vinicius Costa Gomes
  0 siblings, 2 replies; 38+ messages in thread
From: James Hogan @ 2022-07-14  8:14 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Jesse Brandeburg

Hi,

I'm getting regular hangs after resume from suspend with the igc driver, for
an I225-V (rev 3) on an MSI Pro Z690-A, with version 5.18.11 on archlinux. A few
stable versions ago it was possible to get the network back up by removing and
reloading the igc driver, however now I get the following, and only a reboot
works (which itself hangs before actually restarting the machine, and requires
a hard reset).

Any ideas?

INFO: task NetworkManager:1139 blocked for more than 124 seconds.
      Not tainted 5.18.11-arch1-1 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:NetworkManager  state:D stack:    0 pid: 1139 ppid:     1 flags:0x00004002
Call Trace:
 <TASK>
 __schedule+0x37c/0x11f0
 schedule+0x4f/0xb0
 schedule_preempt_disabled+0x15/0x20
 __mutex_lock.constprop.0+0x2d0/0x480
 ? igc_tsn_reset+0x64/0x100 [igc c7b6f7549edcf5dd76637367233bb9aa57fc35fd]
 igc_resume+0xf6/0x1d0 [igc c7b6f7549edcf5dd76637367233bb9aa57fc35fd]
 pci_pm_runtime_resume+0xab/0xd0 
 ? pci_pm_freeze_noirq+0xe0/0xe0 
 __rpm_callback+0x41/0x160
 rpm_callback+0x35/0x70
 ? pci_pm_freeze_noirq+0xe0/0xe0 
 rpm_resume+0x5e6/0x820
 __pm_runtime_resume+0x4b/0x80
 dev_ethtool+0x128/0x3060
 ? inet_ioctl+0xdc/0x1e0
 dev_ioctl+0x157/0x520
 sock_do_ioctl+0xd7/0x120
 sock_ioctl+0xee/0x330
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 ? syscall_exit_to_user_mode+0x26/0x50
 __x64_sys_ioctl+0x8e/0xc0
 do_syscall_64+0x5c/0x90
 ? do_syscall_64+0x6b/0x90
 ? do_syscall_64+0x6b/0x90
 ? do_syscall_64+0x6b/0x90
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f5d37b077af
RSP: 002b:00007ffd791600f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd791603e0 RCX: 00007f5d37b077af
RDX: 00007ffd79160210 RSI: 0000000000008946 RDI: 0000000000000013
RBP: 00007ffd79160390 R08: 0000000000000000 R09: 00007ffd791603e8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd79160210 R14: 00007ffd791601f0 R15: 00007ffd791601f0
 </TASK>

Thanks
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-07-14  8:14 [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset James Hogan
@ 2022-07-15 17:25 ` Tony Nguyen
  2022-07-17 19:59 ` Vinicius Costa Gomes
  1 sibling, 0 replies; 38+ messages in thread
From: Tony Nguyen @ 2022-07-15 17:25 UTC (permalink / raw)
  To: James Hogan, intel-wired-lan, Sasha Neftin, Gomes, Vinicius
  Cc: Jesse Brandeburg

Adding a couple people who work on igc.

On 7/14/2022 1:14 AM, James Hogan wrote:
> Hi,
> 
> I'm getting regular hangs after resume from suspend with the igc driver, for
> an I225-V (rev 3) on an MSI Pro Z690-A, with version 5.18.11 on archlinux. A few
> stable versions ago it was possible to get the network back up by removing and
> reloading the igc driver, however now I get the following, and only a reboot
> works (which itself hangs before actually restarting the machine, and requires
> a hard reset).
> 
> Any ideas?
> 
> INFO: task NetworkManager:1139 blocked for more than 124 seconds.
>        Not tainted 5.18.11-arch1-1 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:NetworkManager  state:D stack:    0 pid: 1139 ppid:     1 flags:0x00004002
> Call Trace:
>   <TASK>
>   __schedule+0x37c/0x11f0
>   schedule+0x4f/0xb0
>   schedule_preempt_disabled+0x15/0x20
>   __mutex_lock.constprop.0+0x2d0/0x480
>   ? igc_tsn_reset+0x64/0x100 [igc c7b6f7549edcf5dd76637367233bb9aa57fc35fd]
>   igc_resume+0xf6/0x1d0 [igc c7b6f7549edcf5dd76637367233bb9aa57fc35fd]
>   pci_pm_runtime_resume+0xab/0xd0
>   ? pci_pm_freeze_noirq+0xe0/0xe0
>   __rpm_callback+0x41/0x160
>   rpm_callback+0x35/0x70
>   ? pci_pm_freeze_noirq+0xe0/0xe0
>   rpm_resume+0x5e6/0x820
>   __pm_runtime_resume+0x4b/0x80
>   dev_ethtool+0x128/0x3060
>   ? inet_ioctl+0xdc/0x1e0
>   dev_ioctl+0x157/0x520
>   sock_do_ioctl+0xd7/0x120
>   sock_ioctl+0xee/0x330
>   ? syscall_exit_to_user_mode+0x26/0x50
>   ? do_syscall_64+0x6b/0x90
>   ? syscall_exit_to_user_mode+0x26/0x50
>   __x64_sys_ioctl+0x8e/0xc0
>   do_syscall_64+0x5c/0x90
>   ? do_syscall_64+0x6b/0x90
>   ? do_syscall_64+0x6b/0x90
>   ? do_syscall_64+0x6b/0x90
>   ? syscall_exit_to_user_mode+0x26/0x50
>   ? do_syscall_64+0x6b/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f5d37b077af
> RSP: 002b:00007ffd791600f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007ffd791603e0 RCX: 00007f5d37b077af
> RDX: 00007ffd79160210 RSI: 0000000000008946 RDI: 0000000000000013
> RBP: 00007ffd79160390 R08: 0000000000000000 R09: 00007ffd791603e8
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffd79160210 R14: 00007ffd791601f0 R15: 00007ffd791601f0
>   </TASK>
> 
> Thanks
> James
> 
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-07-14  8:14 [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset James Hogan
  2022-07-15 17:25 ` Tony Nguyen
@ 2022-07-17 19:59 ` Vinicius Costa Gomes
       [not found]   ` <4773114.31r3eYUQgx@saruman>
  1 sibling, 1 reply; 38+ messages in thread
From: Vinicius Costa Gomes @ 2022-07-17 19:59 UTC (permalink / raw)
  To: James Hogan, intel-wired-lan; +Cc: Jesse Brandeburg

Hi James,

James Hogan <jhogan@kernel.org> writes:

> Hi,
>
> I'm getting regular hangs after resume from suspend with the igc driver, for
> an I225-V (rev 3) on an MSI Pro Z690-A, with version 5.18.11 on archlinux. A few
> stable versions ago it was possible to get the network back up by removing and
> reloading the igc driver, however now I get the following, and only a reboot
> works (which itself hangs before actually restarting the machine, and requires
> a hard reset).
>

Sorry for the delay. I was travelling.

I remember seeing some weird behaviors with PCIe PTM and suspend/resume.
Specially with onboard controllers.

Can you see if disabling CONFIG_PCIE_PTM in your kernel config changes
anything? (assuming it's enabled)


Cheers,
-- 
Vinicius
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
       [not found]   ` <4773114.31r3eYUQgx@saruman>
@ 2022-07-23 15:52     ` James Hogan
  2022-07-27 14:37       ` Vinicius Costa Gomes
  0 siblings, 1 reply; 38+ messages in thread
From: James Hogan @ 2022-07-23 15:52 UTC (permalink / raw)
  To: Vinicius Costa Gomes, Jesse Brandeburg; +Cc: intel-wired-lan

On Sunday, 17 July 2022 22:40:59 BST James Hogan wrote:
> On Sunday, 17 July 2022 20:59:36 BST you wrote:
> > Hi James,
> > 
> > James Hogan <jhogan@kernel.org> writes:
> > > Hi,
> > > 
> > > I'm getting regular hangs after resume from suspend with the igc driver,
> > > for an I225-V (rev 3) on an MSI Pro Z690-A, with version 5.18.11 on
> > > archlinux. A few stable versions ago it was possible to get the network
> > > back up by removing and reloading the igc driver, however now I get the
> > > following, and only a reboot works (which itself hangs before actually
> > > restarting the machine, and requires a hard reset).
> > 
> > Sorry for the delay. I was travelling.
> 
> No worries
> 
> > I remember seeing some weird behaviors with PCIe PTM and suspend/resume.
> > Specially with onboard controllers.
> 
> It appears that the hardware got itself into a funny state such that
> NetworkManager hung as described more often than not on resume, however
> without changing kernel it has now settled back into the previous behaviour
> of usually working, but occasionally (maybe 1 in 5) the network wouldn't
> come back up on resume, with network related things hung until I unload and
> reload the igc module.
> 
> > Can you see if disabling CONFIG_PCIE_PTM in your kernel config changes
> > anything? (assuming it's enabled)
> 
> It is enabled yes. Okay I'll give it a go when I get the chance. I'll likely
> have to do a bunch of boot and suspend cycles to try and get it back into
> either failure condition.

(sorry somehow dropped others off cc the other day, now adding back)...

I've been running most of this week with 5.18.12-arch1-1, rebuilt with
CONFIG_PCIE_PTM=n, however I have now observed both cases.

It failed to bring up the network link a couple of times after resume from
suspend, and i managed to remove the igc module and reload it to get it going
again.

Another time it failed to come back up, but reloading module didn't help.

I also hit the igc_tsn_reset hang, but this time it was immediately after boot
(possibly a warm reset), where it failed to bring up the network at all. I'll
paste the full backtraces of hung tasks below.

I'm wondering whether, since most of the tasks are stuck trying to acquire a
mutex, the issue is elsewhere. In some past cases though all the tasks that
are dumped are at a mutex_lock...

Cheers
James



INFO: task kworker/u40:13:659 blocked for more than 122 seconds.
      Not tainted 5.18.12-arch1-1 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u40:13  state:D stack:    0 pid:  659 ppid:     2 flags:0x00004000
Workqueue: netns cleanup_net
Call Trace:
 <TASK>
 __schedule+0x37c/0x11f0
 ? schedule+0x4f/0xb0
 ? wq_worker_running+0xe/0x50
 ? schedule_timeout+0x72/0x150
 schedule+0x4f/0xb0
 schedule_preempt_disabled+0x15/0x20
 __mutex_lock.constprop.0+0x2d0/0x480
 default_device_exit_batch+0x38/0x270
 ? synchronize_rcu+0x8b/0xa0
 ? rcu_gp_kthread+0x140/0x140
 cleanup_net+0x221/0x3b0
 process_one_work+0x1c4/0x380
 worker_thread+0x51/0x380
 ? rescuer_thread+0x3a0/0x3a0
 kthread+0xdb/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
INFO: task NetworkManager:876 blocked for more than 122 seconds.
      Not tainted 5.18.12-arch1-1 #1  
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:NetworkManager  state:D stack:    0 pid:  876 ppid:     1 flags:0x00004002
Call Trace:
 <TASK>
 __schedule+0x37c/0x11f0
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 schedule+0x4f/0xb0
 schedule_preempt_disabled+0x15/0x20
 __mutex_lock.constprop.0+0x2d0/0x480
 ? igc_tsn_reset+0x64/0x100 [igc 9c97d54db6c7f40531170f2b5af74f206f34f4f1]
 igc_resume+0xf6/0x1d0 [igc 9c97d54db6c7f40531170f2b5af74f206f34f4f1]
 pci_pm_runtime_resume+0xab/0xd0 
 ? pci_pm_freeze_noirq+0xe0/0xe0 
 __rpm_callback+0x41/0x160
 rpm_callback+0x35/0x70
 ? pci_pm_freeze_noirq+0xe0/0xe0 
 rpm_resume+0x5e6/0x820
 __pm_runtime_resume+0x4b/0x80
 dev_ethtool+0x128/0x3060
 ? refill_stock+0x1a/0x30
 ? try_charge_memcg+0x779/0x7c0
 ? xa_load+0x8b/0xe0
 ? inet_ioctl+0xdc/0x1e0
 dev_ioctl+0x157/0x520
 sock_do_ioctl+0xd7/0x120
 ? kmem_cache_free+0x189/0x380
 sock_ioctl+0xee/0x330
 ? call_rcu+0xa1/0x290
 __x64_sys_ioctl+0x8e/0xc0
 do_syscall_64+0x5c/0x90
 ? do_syscall_64+0x6b/0x90
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 ? do_syscall_64+0x6b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f128f3077af
RSP: 002b:00007ffc5fc81710 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffc5fc81a00 RCX: 00007f128f3077af
RDX: 00007ffc5fc81830 RSI: 0000000000008946 RDI: 0000000000000012
RBP: 00007ffc5fc819b0 R08: 0000000000000000 R09: 00007ffc5fc81a08
R10: 0000000000000011 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc5fc81830 R14: 00007ffc5fc81810 R15: 00007ffc5fc81810
 </TASK>
INFO: task kdeconnectd:1578 blocked for more than 122 seconds.
      Not tainted 5.18.12-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kdeconnectd     state:D stack:    0 pid: 1578 ppid:  1351 flags:0x00000006
Call Trace:
 <TASK>
 __schedule+0x37c/0x11f0
 schedule+0x4f/0xb0
 schedule_preempt_disabled+0x15/0x20
 __mutex_lock.constprop.0+0x2d0/0x480
 __netlink_dump_start+0xc5/0x2f0
 ? rtnl_fill_ifinfo+0x1300/0x1300
 rtnetlink_rcv_msg+0x264/0x370
 ? rtnl_fill_ifinfo+0x1300/0x1300
 ? rtnl_calcit.isra.0+0x140/0x140
 netlink_rcv_skb+0x52/0x100
 netlink_unicast+0x240/0x390
 netlink_sendmsg+0x254/0x4b0
 sock_sendmsg+0x5d/0x70
 __sys_sendto+0x117/0x160
 __x64_sys_sendto+0x24/0x30
 do_syscall_64+0x5c/0x90
 ? handle_mm_fault+0xb2/0x280
 ? do_user_addr_fault+0x1db/0x680
 ? exc_page_fault+0x74/0x170
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2f46d136ac
RSP: 002b:00007ffdb8440c00 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f2f46d136ac
RDX: 0000000000000020 RSI: 00007ffdb8440cb0 RDI: 000000000000000b
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdb8440c80
R13: 00007f2f473ff3b0 R14: 0000564103b1d230 R15: 00007ffdb8440d40
 </TASK>
INFO: task DiscoverNotifie:1603 blocked for more than 122 seconds.
      Not tainted 5.18.12-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:DiscoverNotifie state:D stack:    0 pid: 1603 ppid:  1351 flags:0x00000002
Call Trace:
 <TASK>
 __schedule+0x37c/0x11f0
 schedule+0x4f/0xb0
 schedule_preempt_disabled+0x15/0x20
 __mutex_lock.constprop.0+0x2d0/0x480
 __netlink_dump_start+0xc5/0x2f0
 ? rtnl_fill_ifinfo+0x1300/0x1300
 rtnetlink_rcv_msg+0x264/0x370
 ? rtnl_fill_ifinfo+0x1300/0x1300
 ? rtnl_calcit.isra.0+0x140/0x140
 netlink_rcv_skb+0x52/0x100
 netlink_unicast+0x240/0x390
 netlink_sendmsg+0x254/0x4b0
 sock_sendmsg+0x5d/0x70
 __sys_sendto+0x117/0x160
 __x64_sys_sendto+0x24/0x30
 do_syscall_64+0x5c/0x90
 ? do_user_addr_fault+0x1db/0x680
 ? exc_page_fault+0x74/0x170
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f93d55136ac
RSP: 002b:00007ffd2dceece0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f93d55136ac
RDX: 0000000000000020 RSI: 00007ffd2dceed90 RDI: 000000000000000d
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd2dceed60
R13: 00007f93d75a43b0 R14: 00007f93c400ae20 R15: 00007ffd2dceee20
 </TASK>
INFO: task packagekitd:1621 blocked for more than 122 seconds.
      Not tainted 5.18.12-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:packagekitd     state:D stack:    0 pid: 1621 ppid:     1 flags:0x00000006
Call Trace:
 <TASK>
 __schedule+0x37c/0x11f0
 schedule+0x4f/0xb0
 schedule_preempt_disabled+0x15/0x20
 __mutex_lock.constprop.0+0x2d0/0x480
 __netlink_dump_start+0xc5/0x2f0
 ? validate_linkmsg+0x130/0x130
 rtnetlink_rcv_msg+0x264/0x370
 ? validate_linkmsg+0x130/0x130
 ? rtnl_calcit.isra.0+0x140/0x140
 netlink_rcv_skb+0x52/0x100
 netlink_unicast+0x240/0x390
 netlink_sendmsg+0x254/0x4b0
 sock_sendmsg+0x5d/0x70
 __sys_sendto+0x117/0x160
 __x64_sys_sendto+0x24/0x30
 do_syscall_64+0x5c/0x90
 ? do_syscall_64+0x6b/0x90
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fa7add13660
RSP: 002b:00007ffc2b9d5728 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000560775332130 RCX: 00007fa7add13660
RDX: 0000000000000014 RSI: 00007ffc2b9d5780 RDI: 0000000000000005
RBP: 00007ffc2b9d5780 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000014
R13: 00007ffc2b9d5950 R14: 0000000000000000 R15: 0000000000000001
 </TASK>
INFO: task vulkaninfo:1756 blocked for more than 122 seconds.
      Not tainted 5.18.12-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:vulkaninfo      state:D stack:    0 pid: 1756 ppid:  1679 flags:0x00000006
Call Trace:
 <TASK>
 __schedule+0x37c/0x11f0
 ? mntput_no_expire+0x4a/0x280
 schedule+0x4f/0xb0
 __pm_runtime_barrier+0xa1/0x160
 ? cpuacct_percpu_seq_show+0x20/0x20
 pm_runtime_barrier+0x4c/0x90
 pci_config_pm_runtime_get+0x3a/0x60
 pci_read_config+0x99/0x2e0
 ? __kmalloc+0x171/0x380
 kernfs_fop_read_iter+0xa7/0x1a0
 new_sync_read+0x137/0x1c0
 vfs_read+0x145/0x190
 ksys_read+0x6f/0xf0
 do_syscall_64+0x5c/0x90
 ? exit_to_user_mode_prepare+0x111/0x140
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 ? do_syscall_64+0x6b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f9b3e301b82
RSP: 002b:00007fff32951908 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007fff32951a90 RCX: 00007f9b3e301b82
RDX: 0000000000000030 RSI: 00007fff32951a90 RDI: 0000000000000008
RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000030
R13: 0000000000000000 R14: 0000000000000030 R15: 0000000000000030
 </TASK>
INFO: task firefox:1757 blocked for more than 122 seconds.
      Not tainted 5.18.12-arch1-1 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:firefox         state:D stack:    0 pid: 1757 ppid:  1422 flags:0x00000002
Call Trace:
 <TASK>
 __schedule+0x37c/0x11f0
 schedule+0x4f/0xb0
 schedule_preempt_disabled+0x15/0x20
 __mutex_lock.constprop.0+0x2d0/0x480
 ? memcg_slab_post_alloc_hook+0x19e/0x230
 __netlink_dump_start+0xc5/0x2f0
 ? inet_valid_dump_ifaddr_req.constprop.0+0x1d0/0x1d0
 rtnetlink_rcv_msg+0x264/0x370
 ? inet_valid_dump_ifaddr_req.constprop.0+0x1d0/0x1d0
 ? rtnl_calcit.isra.0+0x140/0x140
 netlink_rcv_skb+0x52/0x100
 netlink_unicast+0x240/0x390
 netlink_sendmsg+0x254/0x4b0
 sock_sendmsg+0x5d/0x70
 __sys_sendto+0x117/0x160
 __x64_sys_sendto+0x24/0x30
 do_syscall_64+0x5c/0x90
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 ? syscall_exit_to_user_mode+0x26/0x50
 ? do_syscall_64+0x6b/0x90
 ? do_syscall_64+0x6b/0x90
 ? do_syscall_64+0x6b/0x90
 ? exc_page_fault+0x74/0x170
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fbcbb513814
RSP: 002b:00007fff67d31230 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fbcabc52b80 RCX: 00007fbcbb513814
RDX: 0000000000000018 RSI: 00007fbcabc82e60 RDI: 000000000000000b
RBP: 0000000000000000 R08: 00007fff67d312e0 R09: 0000000000000080
R10: 0000000000000000 R11: 0000000000000293 R12: 000000000367d10b
R13: 0000000000000001 R14: 00007fbcabc17100 R15: 0000000000000000
 </TASK>



_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-07-23 15:52     ` James Hogan
@ 2022-07-27 14:37       ` Vinicius Costa Gomes
  2022-07-28 17:36         ` James Hogan
  0 siblings, 1 reply; 38+ messages in thread
From: Vinicius Costa Gomes @ 2022-07-27 14:37 UTC (permalink / raw)
  To: James Hogan, Jesse Brandeburg; +Cc: intel-wired-lan

Hi James,

James Hogan <jhogan@kernel.org> writes:

> On Sunday, 17 July 2022 22:40:59 BST James Hogan wrote:
>> On Sunday, 17 July 2022 20:59:36 BST you wrote:
>> > Hi James,
>> > 
>> > James Hogan <jhogan@kernel.org> writes:
>> > > Hi,
>> > > 
>> > > I'm getting regular hangs after resume from suspend with the igc driver,
>> > > for an I225-V (rev 3) on an MSI Pro Z690-A, with version 5.18.11 on
>> > > archlinux. A few stable versions ago it was possible to get the network
>> > > back up by removing and reloading the igc driver, however now I get the
>> > > following, and only a reboot works (which itself hangs before actually
>> > > restarting the machine, and requires a hard reset).
>> > 
>> > Sorry for the delay. I was travelling.
>> 
>> No worries
>> 
>> > I remember seeing some weird behaviors with PCIe PTM and suspend/resume.
>> > Specially with onboard controllers.
>> 
>> It appears that the hardware got itself into a funny state such that
>> NetworkManager hung as described more often than not on resume, however
>> without changing kernel it has now settled back into the previous behaviour
>> of usually working, but occasionally (maybe 1 in 5) the network wouldn't
>> come back up on resume, with network related things hung until I unload and
>> reload the igc module.
>> 
>> > Can you see if disabling CONFIG_PCIE_PTM in your kernel config changes
>> > anything? (assuming it's enabled)
>> 
>> It is enabled yes. Okay I'll give it a go when I get the chance. I'll likely
>> have to do a bunch of boot and suspend cycles to try and get it back into
>> either failure condition.
>
> (sorry somehow dropped others off cc the other day, now adding back)...
>
> I've been running most of this week with 5.18.12-arch1-1, rebuilt with
> CONFIG_PCIE_PTM=n, however I have now observed both cases.
>
> It failed to bring up the network link a couple of times after resume from
> suspend, and i managed to remove the igc module and reload it to get it going
> again.
>
> Another time it failed to come back up, but reloading module didn't help.
>
> I also hit the igc_tsn_reset hang, but this time it was immediately after boot
> (possibly a warm reset), where it failed to bring up the network at all. I'll
> paste the full backtraces of hung tasks below.
>
> I'm wondering whether, since most of the tasks are stuck trying to acquire a
> mutex, the issue is elsewhere. In some past cases though all the tasks that
> are dumped are at a mutex_lock...

Yeah, I agree that it seems like the issue is something else. I would
suggest start with the "simple" things, enabling 'CONFIG_PROVE_LOCKING'
and looking at the first splat, it could be that what you are seeing is
caused by a deadlock somewhere else.


Cheers,
-- 
Vinicius
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-07-27 14:37       ` Vinicius Costa Gomes
@ 2022-07-28 17:36         ` James Hogan
  2022-08-04 13:03           ` James Hogan
  0 siblings, 1 reply; 38+ messages in thread
From: James Hogan @ 2022-07-28 17:36 UTC (permalink / raw)
  To: Jesse Brandeburg, Vinicius Costa Gomes; +Cc: intel-wired-lan

Hi,

On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
> James Hogan <jhogan@kernel.org> writes:
> > On Sunday, 17 July 2022 22:40:59 BST James Hogan wrote:
> >> On Sunday, 17 July 2022 20:59:36 BST you wrote:
> >> > Hi James,
> >> > 
> >> > James Hogan <jhogan@kernel.org> writes:
> >> > > Hi,
> >> > > 
> >> > > I'm getting regular hangs after resume from suspend with the igc
> >> > > driver,
> >> > > for an I225-V (rev 3) on an MSI Pro Z690-A, with version 5.18.11 on
> >> > > archlinux. A few stable versions ago it was possible to get the
> >> > > network
> >> > > back up by removing and reloading the igc driver, however now I get
> >> > > the
> >> > > following, and only a reboot works (which itself hangs before
> >> > > actually
> >> > > restarting the machine, and requires a hard reset).
> >> > 
> >> > Sorry for the delay. I was travelling.
> >> 
> >> No worries
> >> 
> >> > I remember seeing some weird behaviors with PCIe PTM and
> >> > suspend/resume.
> >> > Specially with onboard controllers.
> >> 
> >> It appears that the hardware got itself into a funny state such that
> >> NetworkManager hung as described more often than not on resume, however
> >> without changing kernel it has now settled back into the previous
> >> behaviour
> >> of usually working, but occasionally (maybe 1 in 5) the network wouldn't
> >> come back up on resume, with network related things hung until I unload
> >> and
> >> reload the igc module.
> >> 
> >> > Can you see if disabling CONFIG_PCIE_PTM in your kernel config changes
> >> > anything? (assuming it's enabled)
> >> 
> >> It is enabled yes. Okay I'll give it a go when I get the chance. I'll
> >> likely have to do a bunch of boot and suspend cycles to try and get it
> >> back into either failure condition.
> > 
> > (sorry somehow dropped others off cc the other day, now adding back)...
> > 
> > I've been running most of this week with 5.18.12-arch1-1, rebuilt with
> > CONFIG_PCIE_PTM=n, however I have now observed both cases.
> > 
> > It failed to bring up the network link a couple of times after resume from
> > suspend, and i managed to remove the igc module and reload it to get it
> > going again.
> > 
> > Another time it failed to come back up, but reloading module didn't help.
> > 
> > I also hit the igc_tsn_reset hang, but this time it was immediately after
> > boot (possibly a warm reset), where it failed to bring up the network at
> > all. I'll paste the full backtraces of hung tasks below.
> > 
> > I'm wondering whether, since most of the tasks are stuck trying to acquire
> > a mutex, the issue is elsewhere. In some past cases though all the tasks
> > that are dumped are at a mutex_lock...
> 
> Yeah, I agree that it seems like the issue is something else. I would
> suggest start with the "simple" things, enabling 'CONFIG_PROVE_LOCKING'
> and looking at the first splat, it could be that what you are seeing is
> caused by a deadlock somewhere else.

This is revealing I think (re-enabled PCIE_PTM and enabled PROVE_LOCKING).

In this case it happened within minutes of boot, but a few previous attempts
with several suspend cycles with the same kernel didn't detect the same thing.

NetworkManager[857]: <info>  [1659028974.1752] device (enp6s0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')

============================================
WARNING: possible recursive locking detected
5.18.12-arch1-1 #2 Not tainted
--------------------------------------------
NetworkManager/857 is trying to acquire lock:
ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: igc_resume+0xf6/0x1d0 [igc]

but task is already holding lock:
ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080

other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0
       ----
  lock(rtnl_mutex);
  lock(rtnl_mutex);

 *** DEADLOCK ***
 May be due to missing lock nesting notation
1 lock held by NetworkManager/857:
 #0: ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080

stack backtrace:
CPU: 0 PID: 857 Comm: NetworkManager Not tainted 5.18.12-arch1-1 #2 369425cead7bf2331cd4c5d2279465ad4a0fc21f
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40 05/17/2022
Call Trace:
 <TASK>
 dump_stack_lvl+0x5f/0x78
 __lock_acquire.cold+0xd4/0x2e5
 ? __lock_acquire+0x3b2/0x1fd0
 lock_acquire+0xc8/0x2d0
 ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 ? lock_is_held_type+0xaa/0x120
 __mutex_lock+0xb6/0x830
 ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 ? lockdep_hardirqs_on_prepare+0xdd/0x180
 ? _raw_spin_unlock_irqrestore+0x34/0x50
 ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
 pci_pm_runtime_resume+0xab/0xd0
 ? pci_pm_freeze_noirq+0xe0/0xe0
 __rpm_callback+0x41/0x160
 rpm_callback+0x35/0x70
 ? pci_pm_freeze_noirq+0xe0/0xe0
 rpm_resume+0x5eb/0x820
 __pm_runtime_resume+0x4b/0x80
 dev_ethtool+0x128/0x3080
 ? lock_is_held_type+0xaa/0x120
 ? find_held_lock+0x2b/0x80
 ? dev_load+0x57/0x140
 ? lock_release+0xd4/0x2d0
 dev_ioctl+0x155/0x560
 sock_do_ioctl+0xd7/0x120
 sock_ioctl+0x103/0x360
 ? __fget_files+0xd2/0x170
 __x64_sys_ioctl+0x8e/0xc0
 do_syscall_64+0x5c/0x90
 ? do_syscall_64+0x6b/0x90
 ? lockdep_hardirqs_on_prepare+0xdd/0x180
 ? do_syscall_64+0x6b/0x90
 ? lockdep_hardirqs_on_prepare+0xdd/0x180
 ? do_syscall_64+0x6b/0x90
 ? asm_sysvec_apic_timer_interrupt+0xe/0x20
 ? rcu_read_lock_sched_held+0x40/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2c35d077af
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffd7319afd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd7319b2c0 RCX: 00007f2c35d077af
RDX: 00007ffd7319b0f0 RSI: 0000000000008946 RDI: 0000000000000012
RBP: 00007ffd7319b270 R08: 0000000000000000 R09: 00007ffd7319b2c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd7319b0f0 R14: 00007ffd7319b0d0 R15: 00007ffd7319b0d0
 </TASK>

Cheers
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-07-28 17:36         ` James Hogan
@ 2022-08-04 13:03           ` James Hogan
  2022-08-04 13:27             ` Paul Menzel
  0 siblings, 1 reply; 38+ messages in thread
From: James Hogan @ 2022-08-04 13:03 UTC (permalink / raw)
  To: Jesse Brandeburg, Vinicius Costa Gomes; +Cc: intel-wired-lan

Hi,

On Thursday, 28 July 2022 18:36:31 BST James Hogan wrote:
> On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
> > Yeah, I agree that it seems like the issue is something else. I would
> > suggest start with the "simple" things, enabling 'CONFIG_PROVE_LOCKING'
> > and looking at the first splat, it could be that what you are seeing is
> > caused by a deadlock somewhere else.
> 
> This is revealing I think (re-enabled PCIE_PTM and enabled PROVE_LOCKING).
> 
> In this case it happened within minutes of boot, but a few previous attempts
> with several suspend cycles with the same kernel didn't detect the same
> thing.

I hate to nag, but any thoughts on the lockdep recursive locking warning 
below? It seems to indicate a recursive taking of rtnl_mutex in dev_ethtool 
and igc_resume, which would certainly seem to point the finger squarely back at 
the igc driver.

All the best,
James

> 
> NetworkManager[857]: <info>  [1659028974.1752] device (enp6s0): state
> change: activated -> unavailable (reason 'carrier-changed',
> sys-iface-state: 'managed')
> 
> ============================================
> WARNING: possible recursive locking detected
> 5.18.12-arch1-1 #2 Not tainted
> --------------------------------------------
> NetworkManager/857 is trying to acquire lock:
> ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: igc_resume+0xf6/0x1d0 [igc]
> 
> but task is already holding lock:
> ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>        CPU0
>        ----
>   lock(rtnl_mutex);
>   lock(rtnl_mutex);
> 
>  *** DEADLOCK ***
>  May be due to missing lock nesting notation
> 1 lock held by NetworkManager/857:
>  #0: ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080
> 
> stack backtrace:
> CPU: 0 PID: 857 Comm: NetworkManager Not tainted 5.18.12-arch1-1 #2
> 369425cead7bf2331cd4c5d2279465ad4a0fc21f Hardware name: Micro-Star
> International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40
> 05/17/2022 Call Trace:
>  <TASK>
>  dump_stack_lvl+0x5f/0x78
>  __lock_acquire.cold+0xd4/0x2e5
>  ? __lock_acquire+0x3b2/0x1fd0
>  lock_acquire+0xc8/0x2d0
>  ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>  ? lock_is_held_type+0xaa/0x120
>  __mutex_lock+0xb6/0x830
>  ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>  ? lockdep_hardirqs_on_prepare+0xdd/0x180
>  ? _raw_spin_unlock_irqrestore+0x34/0x50
>  ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>  ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>  igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>  pci_pm_runtime_resume+0xab/0xd0
>  ? pci_pm_freeze_noirq+0xe0/0xe0
>  __rpm_callback+0x41/0x160
>  rpm_callback+0x35/0x70
>  ? pci_pm_freeze_noirq+0xe0/0xe0
>  rpm_resume+0x5eb/0x820
>  __pm_runtime_resume+0x4b/0x80
>  dev_ethtool+0x128/0x3080
>  ? lock_is_held_type+0xaa/0x120
>  ? find_held_lock+0x2b/0x80
>  ? dev_load+0x57/0x140
>  ? lock_release+0xd4/0x2d0
>  dev_ioctl+0x155/0x560
>  sock_do_ioctl+0xd7/0x120
>  sock_ioctl+0x103/0x360
>  ? __fget_files+0xd2/0x170
>  __x64_sys_ioctl+0x8e/0xc0
>  do_syscall_64+0x5c/0x90
>  ? do_syscall_64+0x6b/0x90
>  ? lockdep_hardirqs_on_prepare+0xdd/0x180
>  ? do_syscall_64+0x6b/0x90
>  ? lockdep_hardirqs_on_prepare+0xdd/0x180
>  ? do_syscall_64+0x6b/0x90
>  ? asm_sysvec_apic_timer_interrupt+0xe/0x20
>  ? rcu_read_lock_sched_held+0x40/0x80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f2c35d077af
> Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44
> 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0
> ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 RSP:
> 002b:00007ffd7319afd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX:
> ffffffffffffffda RBX: 00007ffd7319b2c0 RCX: 00007f2c35d077af
> RDX: 00007ffd7319b0f0 RSI: 0000000000008946 RDI: 0000000000000012
> RBP: 00007ffd7319b270 R08: 0000000000000000 R09: 00007ffd7319b2c8
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffd7319b0f0 R14: 00007ffd7319b0d0 R15: 00007ffd7319b0d0
>  </TASK>
> 
> Cheers
> James




_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-08-04 13:03           ` James Hogan
@ 2022-08-04 13:27             ` Paul Menzel
  2022-08-04 21:41                 ` James Hogan
  0 siblings, 1 reply; 38+ messages in thread
From: Paul Menzel @ 2022-08-04 13:27 UTC (permalink / raw)
  To: James Hogan; +Cc: intel-wired-lan, Jesse Brandeburg

Dear James,


Am 04.08.22 um 15:03 schrieb James Hogan:

> On Thursday, 28 July 2022 18:36:31 BST James Hogan wrote:
>> On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
>>> Yeah, I agree that it seems like the issue is something else. I would
>>> suggest start with the "simple" things, enabling 'CONFIG_PROVE_LOCKING'
>>> and looking at the first splat, it could be that what you are seeing is
>>> caused by a deadlock somewhere else.
>>
>> This is revealing I think (re-enabled PCIE_PTM and enabled PROVE_LOCKING).
>>
>> In this case it happened within minutes of boot, but a few previous attempts
>> with several suspend cycles with the same kernel didn't detect the same
>> thing.
> 
> I hate to nag, but any thoughts on the lockdep recursive locking warning
> below? It seems to indicate a recursive taking of rtnl_mutex in dev_ethtool
> and igc_resume, which would certainly seem to point the finger squarely back at
> the igc driver.

I hope, the developers will respond quickly. If it is indeed a 
regression, and you do not want to wait for the developers, you could 
try to bisect the issue. To speed up the test cycles, I recommend to try 
to try to reproduce the issue in QEMU/KVM and passing through the 
network device.


Kind regards,

Paul Menzel


>> NetworkManager[857]: <info>  [1659028974.1752] device (enp6s0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
>>
>> ============================================
>> WARNING: possible recursive locking detected
>> 5.18.12-arch1-1 #2 Not tainted
>> --------------------------------------------
>> NetworkManager/857 is trying to acquire lock:
>> ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: igc_resume+0xf6/0x1d0 [igc]
>>
>> but task is already holding lock:
>> ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080
>>
>> other info that might help us debug this:
>>   Possible unsafe locking scenario:
>>         CPU0
>>         ----
>>    lock(rtnl_mutex);
>>    lock(rtnl_mutex);
>>
>>   *** DEADLOCK ***
>>   May be due to missing lock nesting notation
>> 1 lock held by NetworkManager/857:
>>   #0: ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080
>>
>> stack backtrace:
>> CPU: 0 PID: 857 Comm: NetworkManager Not tainted 5.18.12-arch1-1 #2
>> 369425cead7bf2331cd4c5d2279465ad4a0fc21f Hardware name: Micro-Star
>> International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40
>> 05/17/2022 Call Trace:
>>   <TASK>
>>   dump_stack_lvl+0x5f/0x78
>>   __lock_acquire.cold+0xd4/0x2e5
>>   ? __lock_acquire+0x3b2/0x1fd0
>>   lock_acquire+0xc8/0x2d0
>>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>>   ? lock_is_held_type+0xaa/0x120
>>   __mutex_lock+0xb6/0x830
>>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
>>   ? _raw_spin_unlock_irqrestore+0x34/0x50
>>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>>   igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
>>   pci_pm_runtime_resume+0xab/0xd0
>>   ? pci_pm_freeze_noirq+0xe0/0xe0
>>   __rpm_callback+0x41/0x160
>>   rpm_callback+0x35/0x70
>>   ? pci_pm_freeze_noirq+0xe0/0xe0
>>   rpm_resume+0x5eb/0x820
>>   __pm_runtime_resume+0x4b/0x80
>>   dev_ethtool+0x128/0x3080
>>   ? lock_is_held_type+0xaa/0x120
>>   ? find_held_lock+0x2b/0x80
>>   ? dev_load+0x57/0x140
>>   ? lock_release+0xd4/0x2d0
>>   dev_ioctl+0x155/0x560
>>   sock_do_ioctl+0xd7/0x120
>>   sock_ioctl+0x103/0x360
>>   ? __fget_files+0xd2/0x170
>>   __x64_sys_ioctl+0x8e/0xc0
>>   do_syscall_64+0x5c/0x90
>>   ? do_syscall_64+0x6b/0x90
>>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
>>   ? do_syscall_64+0x6b/0x90
>>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
>>   ? do_syscall_64+0x6b/0x90
>>   ? asm_sysvec_apic_timer_interrupt+0xe/0x20
>>   ? rcu_read_lock_sched_held+0x40/0x80
>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>> RIP: 0033:0x7f2c35d077af
>> Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44
>> 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0
>> ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 RSP:
>> 002b:00007ffd7319afd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX:
>> ffffffffffffffda RBX: 00007ffd7319b2c0 RCX: 00007f2c35d077af
>> RDX: 00007ffd7319b0f0 RSI: 0000000000008946 RDI: 0000000000000012
>> RBP: 00007ffd7319b270 R08: 0000000000000000 R09: 00007ffd7319b2c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> R13: 00007ffd7319b0f0 R14: 00007ffd7319b0d0 R15: 00007ffd7319b0d0
>>   </TASK>
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-08-04 13:27             ` Paul Menzel
@ 2022-08-04 21:41                 ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-04 21:41 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Tony Nguyen, Jesse Brandeburg, Vinicius Costa Gomes,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov, netdev

On Thursday, 4 August 2022 14:27:24 BST Paul Menzel wrote:
> Am 04.08.22 um 15:03 schrieb James Hogan:
> > On Thursday, 28 July 2022 18:36:31 BST James Hogan wrote:
> >> On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
> >>> Yeah, I agree that it seems like the issue is something else. I would
> >>> suggest start with the "simple" things, enabling 'CONFIG_PROVE_LOCKING'
> >>> and looking at the first splat, it could be that what you are seeing is
> >>> caused by a deadlock somewhere else.
> >> 
> >> This is revealing I think (re-enabled PCIE_PTM and enabled
> >> PROVE_LOCKING).
> >> 
> >> In this case it happened within minutes of boot, but a few previous
> >> attempts with several suspend cycles with the same kernel didn't detect
> >> the same thing.
> > 
> > I hate to nag, but any thoughts on the lockdep recursive locking warning
> > below? It seems to indicate a recursive taking of rtnl_mutex in
> > dev_ethtool
> > and igc_resume, which would certainly seem to point the finger squarely
> > back at the igc driver.
> 
> I hope, the developers will respond quickly. If it is indeed a
> regression, and you do not want to wait for the developers, you could
> try to bisect the issue. To speed up the test cycles, I recommend to try
> to try to reproduce the issue in QEMU/KVM and passing through the
> network device.

Unfortunately its new hardware for me, so I don't know if there's a good 
working version of the driver. I've only had constant pain with it so far. 
Frequent failed resumes, hangs on shutdown.

However I just did a bit more research and found these dead threads from a 
year ago which appear to pinpoint the issue:
https://lore.kernel.org/all/20210420075406.64105-1-acelan.kao@canonical.com/
https://lore.kernel.org/all/20210809032809.1224002-1-acelan.kao@canonical.com/

They would appear to point to this commit as the one which triggered the 
breakage:
bd869245a3dc ("net: core: try to runtime-resume detached device in 
__dev_open")

Cc'ing Sasha and Aleks (they were added to the latter thread).
Also cc'ing netdev, since it seems to relate to interaction with common code?

Cheers
James


> >> NetworkManager[857]: <info>  [1659028974.1752] device (enp6s0): state
> >> change: activated -> unavailable (reason 'carrier-changed',
> >> sys-iface-state: 'managed')
> >> 
> >> ============================================
> >> WARNING: possible recursive locking detected
> >> 5.18.12-arch1-1 #2 Not tainted
> >> --------------------------------------------
> >> NetworkManager/857 is trying to acquire lock:
> >> ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: igc_resume+0xf6/0x1d0
> >> [igc]
> >> 
> >> but task is already holding lock:
> >> ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080
> >> 
> >> other info that might help us debug this:
> >>   Possible unsafe locking scenario:
> >>         CPU0
> >>         ----
> >>    
> >>    lock(rtnl_mutex);
> >>    lock(rtnl_mutex);
> >>   
> >>   *** DEADLOCK ***
> >>   May be due to missing lock nesting notation
> >> 
> >> 1 lock held by NetworkManager/857:
> >>   #0: ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at:
> >>   dev_ethtool+0xaf/0x3080
> >> 
> >> stack backtrace:
> >> CPU: 0 PID: 857 Comm: NetworkManager Not tainted 5.18.12-arch1-1 #2
> >> 369425cead7bf2331cd4c5d2279465ad4a0fc21f Hardware name: Micro-Star
> >> International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40
> >> 
> >> 05/17/2022 Call Trace:
> >>   <TASK>
> >>   dump_stack_lvl+0x5f/0x78
> >>   __lock_acquire.cold+0xd4/0x2e5
> >>   ? __lock_acquire+0x3b2/0x1fd0
> >>   lock_acquire+0xc8/0x2d0
> >>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   ? lock_is_held_type+0xaa/0x120
> >>   __mutex_lock+0xb6/0x830
> >>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
> >>   ? _raw_spin_unlock_irqrestore+0x34/0x50
> >>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   pci_pm_runtime_resume+0xab/0xd0
> >>   ? pci_pm_freeze_noirq+0xe0/0xe0
> >>   __rpm_callback+0x41/0x160
> >>   rpm_callback+0x35/0x70
> >>   ? pci_pm_freeze_noirq+0xe0/0xe0
> >>   rpm_resume+0x5eb/0x820
> >>   __pm_runtime_resume+0x4b/0x80
> >>   dev_ethtool+0x128/0x3080
> >>   ? lock_is_held_type+0xaa/0x120
> >>   ? find_held_lock+0x2b/0x80
> >>   ? dev_load+0x57/0x140
> >>   ? lock_release+0xd4/0x2d0
> >>   dev_ioctl+0x155/0x560
> >>   sock_do_ioctl+0xd7/0x120
> >>   sock_ioctl+0x103/0x360
> >>   ? __fget_files+0xd2/0x170
> >>   __x64_sys_ioctl+0x8e/0xc0
> >>   do_syscall_64+0x5c/0x90
> >>   ? do_syscall_64+0x6b/0x90
> >>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
> >>   ? do_syscall_64+0x6b/0x90
> >>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
> >>   ? do_syscall_64+0x6b/0x90
> >>   ? asm_sysvec_apic_timer_interrupt+0xe/0x20
> >>   ? rcu_read_lock_sched_held+0x40/0x80
> >>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> >> 
> >> RIP: 0033:0x7f2c35d077af
> >> Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89
> >> 44
> >> 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0
> >> ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 RSP:
> >> 002b:00007ffd7319afd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX:
> >> ffffffffffffffda RBX: 00007ffd7319b2c0 RCX: 00007f2c35d077af
> >> RDX: 00007ffd7319b0f0 RSI: 0000000000008946 RDI: 0000000000000012
> >> RBP: 00007ffd7319b270 R08: 0000000000000000 R09: 00007ffd7319b2c8
> >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >> R13: 00007ffd7319b0f0 R14: 00007ffd7319b0d0 R15: 00007ffd7319b0d0
> >> 
> >>   </TASK>





^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
@ 2022-08-04 21:41                 ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-04 21:41 UTC (permalink / raw)
  To: Paul Menzel
  Cc: netdev, Jesse Brandeburg, Aleksandr Loktionov, intel-wired-lan

On Thursday, 4 August 2022 14:27:24 BST Paul Menzel wrote:
> Am 04.08.22 um 15:03 schrieb James Hogan:
> > On Thursday, 28 July 2022 18:36:31 BST James Hogan wrote:
> >> On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
> >>> Yeah, I agree that it seems like the issue is something else. I would
> >>> suggest start with the "simple" things, enabling 'CONFIG_PROVE_LOCKING'
> >>> and looking at the first splat, it could be that what you are seeing is
> >>> caused by a deadlock somewhere else.
> >> 
> >> This is revealing I think (re-enabled PCIE_PTM and enabled
> >> PROVE_LOCKING).
> >> 
> >> In this case it happened within minutes of boot, but a few previous
> >> attempts with several suspend cycles with the same kernel didn't detect
> >> the same thing.
> > 
> > I hate to nag, but any thoughts on the lockdep recursive locking warning
> > below? It seems to indicate a recursive taking of rtnl_mutex in
> > dev_ethtool
> > and igc_resume, which would certainly seem to point the finger squarely
> > back at the igc driver.
> 
> I hope, the developers will respond quickly. If it is indeed a
> regression, and you do not want to wait for the developers, you could
> try to bisect the issue. To speed up the test cycles, I recommend to try
> to try to reproduce the issue in QEMU/KVM and passing through the
> network device.

Unfortunately its new hardware for me, so I don't know if there's a good 
working version of the driver. I've only had constant pain with it so far. 
Frequent failed resumes, hangs on shutdown.

However I just did a bit more research and found these dead threads from a 
year ago which appear to pinpoint the issue:
https://lore.kernel.org/all/20210420075406.64105-1-acelan.kao@canonical.com/
https://lore.kernel.org/all/20210809032809.1224002-1-acelan.kao@canonical.com/

They would appear to point to this commit as the one which triggered the 
breakage:
bd869245a3dc ("net: core: try to runtime-resume detached device in 
__dev_open")

Cc'ing Sasha and Aleks (they were added to the latter thread).
Also cc'ing netdev, since it seems to relate to interaction with common code?

Cheers
James


> >> NetworkManager[857]: <info>  [1659028974.1752] device (enp6s0): state
> >> change: activated -> unavailable (reason 'carrier-changed',
> >> sys-iface-state: 'managed')
> >> 
> >> ============================================
> >> WARNING: possible recursive locking detected
> >> 5.18.12-arch1-1 #2 Not tainted
> >> --------------------------------------------
> >> NetworkManager/857 is trying to acquire lock:
> >> ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: igc_resume+0xf6/0x1d0
> >> [igc]
> >> 
> >> but task is already holding lock:
> >> ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at: dev_ethtool+0xaf/0x3080
> >> 
> >> other info that might help us debug this:
> >>   Possible unsafe locking scenario:
> >>         CPU0
> >>         ----
> >>    
> >>    lock(rtnl_mutex);
> >>    lock(rtnl_mutex);
> >>   
> >>   *** DEADLOCK ***
> >>   May be due to missing lock nesting notation
> >> 
> >> 1 lock held by NetworkManager/857:
> >>   #0: ffffffff9f9e9048 (rtnl_mutex){+.+.}-{3:3}, at:
> >>   dev_ethtool+0xaf/0x3080
> >> 
> >> stack backtrace:
> >> CPU: 0 PID: 857 Comm: NetworkManager Not tainted 5.18.12-arch1-1 #2
> >> 369425cead7bf2331cd4c5d2279465ad4a0fc21f Hardware name: Micro-Star
> >> International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40
> >> 
> >> 05/17/2022 Call Trace:
> >>   <TASK>
> >>   dump_stack_lvl+0x5f/0x78
> >>   __lock_acquire.cold+0xd4/0x2e5
> >>   ? __lock_acquire+0x3b2/0x1fd0
> >>   lock_acquire+0xc8/0x2d0
> >>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   ? lock_is_held_type+0xaa/0x120
> >>   __mutex_lock+0xb6/0x830
> >>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
> >>   ? _raw_spin_unlock_irqrestore+0x34/0x50
> >>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   ? igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   igc_resume+0xf6/0x1d0 [igc beed6d83546b18fcf82fbbfaeea59871823bceeb]
> >>   pci_pm_runtime_resume+0xab/0xd0
> >>   ? pci_pm_freeze_noirq+0xe0/0xe0
> >>   __rpm_callback+0x41/0x160
> >>   rpm_callback+0x35/0x70
> >>   ? pci_pm_freeze_noirq+0xe0/0xe0
> >>   rpm_resume+0x5eb/0x820
> >>   __pm_runtime_resume+0x4b/0x80
> >>   dev_ethtool+0x128/0x3080
> >>   ? lock_is_held_type+0xaa/0x120
> >>   ? find_held_lock+0x2b/0x80
> >>   ? dev_load+0x57/0x140
> >>   ? lock_release+0xd4/0x2d0
> >>   dev_ioctl+0x155/0x560
> >>   sock_do_ioctl+0xd7/0x120
> >>   sock_ioctl+0x103/0x360
> >>   ? __fget_files+0xd2/0x170
> >>   __x64_sys_ioctl+0x8e/0xc0
> >>   do_syscall_64+0x5c/0x90
> >>   ? do_syscall_64+0x6b/0x90
> >>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
> >>   ? do_syscall_64+0x6b/0x90
> >>   ? lockdep_hardirqs_on_prepare+0xdd/0x180
> >>   ? do_syscall_64+0x6b/0x90
> >>   ? asm_sysvec_apic_timer_interrupt+0xe/0x20
> >>   ? rcu_read_lock_sched_held+0x40/0x80
> >>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> >> 
> >> RIP: 0033:0x7f2c35d077af
> >> Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89
> >> 44
> >> 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0
> >> ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 RSP:
> >> 002b:00007ffd7319afd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX:
> >> ffffffffffffffda RBX: 00007ffd7319b2c0 RCX: 00007f2c35d077af
> >> RDX: 00007ffd7319b0f0 RSI: 0000000000008946 RDI: 0000000000000012
> >> RBP: 00007ffd7319b270 R08: 0000000000000000 R09: 00007ffd7319b2c8
> >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >> R13: 00007ffd7319b0f0 R14: 00007ffd7319b0d0 R15: 00007ffd7319b0d0
> >> 
> >>   </TASK>




_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-08-04 21:41                 ` James Hogan
@ 2022-08-04 22:07                   ` James Hogan
  -1 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-04 22:07 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Tony Nguyen, Jesse Brandeburg, Vinicius Costa Gomes,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov, netdev

On Thursday, 4 August 2022 22:41:01 BST James Hogan wrote:
> On Thursday, 4 August 2022 14:27:24 BST Paul Menzel wrote:
> > Am 04.08.22 um 15:03 schrieb James Hogan:
> > > On Thursday, 28 July 2022 18:36:31 BST James Hogan wrote:
> > >> On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
> > >>> Yeah, I agree that it seems like the issue is something else. I would
> > >>> suggest start with the "simple" things, enabling
> > >>> 'CONFIG_PROVE_LOCKING'
> > >>> and looking at the first splat, it could be that what you are seeing
> > >>> is
> > >>> caused by a deadlock somewhere else.
> > >> 
> > >> This is revealing I think (re-enabled PCIE_PTM and enabled
> > >> PROVE_LOCKING).
> > >> 
> > >> In this case it happened within minutes of boot, but a few previous
> > >> attempts with several suspend cycles with the same kernel didn't detect
> > >> the same thing.
> > > 
> > > I hate to nag, but any thoughts on the lockdep recursive locking warning
> > > below? It seems to indicate a recursive taking of rtnl_mutex in
> > > dev_ethtool
> > > and igc_resume, which would certainly seem to point the finger squarely
> > > back at the igc driver.
> > 
> > I hope, the developers will respond quickly. If it is indeed a
> > regression, and you do not want to wait for the developers, you could
> > try to bisect the issue. To speed up the test cycles, I recommend to try
> > to try to reproduce the issue in QEMU/KVM and passing through the
> > network device.
> 
> Unfortunately its new hardware for me, so I don't know if there's a good
> working version of the driver. I've only had constant pain with it so far.
> Frequent failed resumes, hangs on shutdown.
> 
> However I just did a bit more research and found these dead threads from a
> year ago which appear to pinpoint the issue:
> https://lore.kernel.org/all/20210420075406.64105-1-acelan.kao@canonical.com/
> https://lore.kernel.org/all/20210809032809.1224002-1-acelan.kao@canonical.c
> om/

And I just found this patch from December which may have been masked by the 
PTM issues:
https://lore.kernel.org/netdev/20211201185731.236130-1-vinicius.gomes@intel.com/

I'll build and run with that for a few days and see how it goes.

Cheers
James



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
@ 2022-08-04 22:07                   ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-04 22:07 UTC (permalink / raw)
  To: Paul Menzel
  Cc: netdev, Jesse Brandeburg, Aleksandr Loktionov, intel-wired-lan

On Thursday, 4 August 2022 22:41:01 BST James Hogan wrote:
> On Thursday, 4 August 2022 14:27:24 BST Paul Menzel wrote:
> > Am 04.08.22 um 15:03 schrieb James Hogan:
> > > On Thursday, 28 July 2022 18:36:31 BST James Hogan wrote:
> > >> On Wednesday, 27 July 2022 15:37:09 BST Vinicius Costa Gomes wrote:
> > >>> Yeah, I agree that it seems like the issue is something else. I would
> > >>> suggest start with the "simple" things, enabling
> > >>> 'CONFIG_PROVE_LOCKING'
> > >>> and looking at the first splat, it could be that what you are seeing
> > >>> is
> > >>> caused by a deadlock somewhere else.
> > >> 
> > >> This is revealing I think (re-enabled PCIE_PTM and enabled
> > >> PROVE_LOCKING).
> > >> 
> > >> In this case it happened within minutes of boot, but a few previous
> > >> attempts with several suspend cycles with the same kernel didn't detect
> > >> the same thing.
> > > 
> > > I hate to nag, but any thoughts on the lockdep recursive locking warning
> > > below? It seems to indicate a recursive taking of rtnl_mutex in
> > > dev_ethtool
> > > and igc_resume, which would certainly seem to point the finger squarely
> > > back at the igc driver.
> > 
> > I hope, the developers will respond quickly. If it is indeed a
> > regression, and you do not want to wait for the developers, you could
> > try to bisect the issue. To speed up the test cycles, I recommend to try
> > to try to reproduce the issue in QEMU/KVM and passing through the
> > network device.
> 
> Unfortunately its new hardware for me, so I don't know if there's a good
> working version of the driver. I've only had constant pain with it so far.
> Frequent failed resumes, hangs on shutdown.
> 
> However I just did a bit more research and found these dead threads from a
> year ago which appear to pinpoint the issue:
> https://lore.kernel.org/all/20210420075406.64105-1-acelan.kao@canonical.com/
> https://lore.kernel.org/all/20210809032809.1224002-1-acelan.kao@canonical.c
> om/

And I just found this patch from December which may have been masked by the 
PTM issues:
https://lore.kernel.org/netdev/20211201185731.236130-1-vinicius.gomes@intel.com/

I'll build and run with that for a few days and see how it goes.

Cheers
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
  2022-08-04 22:07                   ` James Hogan
@ 2022-08-05 11:25                     ` James Hogan
  -1 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-05 11:25 UTC (permalink / raw)
  To: Vinicius Costa Gomes, intel-wired-lan, Sasha Neftin, Aleksandr Loktionov
  Cc: Paul Menzel, Tony Nguyen, Jesse Brandeburg, netdev

On Thursday, 4 August 2022 23:07:34 BST James Hogan wrote:
> And I just found this patch from December which may have been masked by the
> PTM issues:
> https://lore.kernel.org/netdev/20211201185731.236130-1-vinicius.gomes@intel.
> com/
> 
> I'll build and run with that for a few days and see how it goes.

I gave it a good hammering yesterday evening with suspend/resume cycles, and
it didn't lock up, however it did still fail to bring the network up a couple
of times, requiring me to unload and reload the driver.

The only kernel log splats I saw were an assert that RTNL mutex wasn't taken
in the igc_runtime_resume path, and a suspicious RCU usage warning, both
pasted below.

I'll keep running with that patch and lockdep enabled (based on
5.18.16-arch1-1) and report back any further issues.

Cheers
James

------------[ cut here ]------------
RTNL: assertion failed at net/core/dev.c (2886)
WARNING: CPU: 0 PID: 7752 at net/core/dev.c:2886 netif_set_real_num_tx_queues+0x1f0/0x210
Modules linked in: rfcomm intel_rapl_msr ee1004 spi_nor iTCO_wdt intel_pmc_bxt mtd iTCO_vendor_support mei_pxp mei_hdcp cmac algif_hash algif_skcipher af_alg bnep pmt_telemetry pmt_class wmi_bmof mxm_wmi intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass rapl intel_cstate intel_uncore pcspkr snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_hda_codec_realtek snd_sof_utils snd_soc_hdac_hda snd_hda_codec_generic snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus ledtrig_audio uvcvideo snd_usb_audio snd_soc_core videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib videobuf2_v4l2 snd_compress i2c_i801 snd_rawmidi spi_intel_pci ac97_bus igc(-) spi_intel videobuf2_common snd_pcm_dmaengine i2c_smbus snd_seq_device mei_me snd_hda_codec_hdmi mei cdc_acm videodev snd_hda_intel snd_intel_dspcfg
 snd_intel_sdw_acpi mc amdgpu mousedev i915 snd_hda_codec snd_hda_core btusb snd_hwdep btrtl snd_pcm btbcm gpu_sched drm_buddy joydev btintel snd_timer drm_ttm_helper btmtk snd ttm intel_vsec drm_dp_helper soundcore intel_gtt serial_multi_instantiate wmi video bluetooth ecdh_generic acpi_tad rfkill coretemp acpi_pad nls_iso8859_1 vfat fat mac_hid ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_multiport xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter i2c_dev sg crypto_user fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_microsoft ff_memless dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod usbhid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd nvme cryptd sr_mod xhci_pci nvme_core cdrom xhci_pci_renesas
CPU: 0 PID: 7752 Comm: kworker/0:1 Not tainted 5.18.16-arch1-1 #3 2927cbed739f932be66f137e6808a2714da26c25
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40 05/17/2022
Workqueue: pm pm_runtime_work
RIP: 0010:netif_set_real_num_tx_queues+0x1f0/0x210
Code: f8 f7 5f 01 00 0f 85 90 fe ff ff ba 46 0b 00 00 48 c7 c6 6e 6e 6f 82 48 c7 c7 70 a5 72 82 c6 05 d8 f7 5f 01 01 e8 1f d6 25 00 <0f> 0b e9 6a fe ff ff b8 ea ff ff ff e9 46 fe ff ff 66 66 2e 0f 1f
RSP: 0018:ffffa25e823dbc98 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff95a5668fa000 RCX: 0000000000000027
RDX: ffff95accfc21a28 RSI: 0000000000000001 RDI: ffff95accfc21a20
RBP: 0000000000000004 R08: 0000000000000000 R09: ffffa25e823dbaa0
R10: 0000000000000003 R11: ffff95acf07ac2e8 R12: 0000000000000001
R13: 0000000000000004 R14: ffff95a5668fa000 R15: ffff95a5691bc1e8
FS:  0000000000000000(0000) GS:ffff95accfc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feec86e5c90 CR3: 0000000252fdc001 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
 <TASK>
 __igc_open+0x40a/0x660 [igc 73e11f9f5110389b26a5a274cff80c9ddf9bab7a]
 __igc_resume+0x133/0x240 [igc 73e11f9f5110389b26a5a274cff80c9ddf9bab7a]
 ? pci_pme_active+0xa5/0x1a0
 pci_pm_runtime_resume+0xab/0xd0
 ? pci_pm_freeze_noirq+0xf0/0xf0
 __rpm_callback+0x41/0x170
 rpm_callback+0x35/0x70
 ? pci_pm_freeze_noirq+0xf0/0xf0
 rpm_resume+0x5ee/0x820
 pm_runtime_work+0x7c/0xb0
 process_one_work+0x276/0x570
 worker_thread+0x53/0x390
 ? _raw_spin_unlock_irqrestore+0x34/0x60
 ? process_one_work+0x570/0x570
 kthread+0xdb/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
irq event stamp: 128847
hardirqs last  enabled at (128853): [<ffffffff81132dde>] __up_console_sem+0x5e/0x70
hardirqs last disabled at (128858): [<ffffffff81132dc3>] __up_console_sem+0x43/0x70
softirqs last  enabled at (125404): [<ffffffff810a5533>] __irq_exit_rcu+0xa3/0xd0
softirqs last disabled at (125395): [<ffffffff810a5533>] __irq_exit_rcu+0xa3/0xd0
---[ end trace 0000000000000000 ]---

=============================
WARNING: suspicious RCU usage
5.18.16-arch1-1 #3 Tainted: G        W   
-----------------------------
net/sched/sch_generic.c:1389 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1 
2 locks held by kworker/0:1/7752:
 #0: ffff95a540ba8b38 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x1f5/0x570
 #1: ffffa25e823dbe78 ((work_completion)(&dev->power.work)){+.+.}-{0:0}, at: process_one_work+0x1f5/0x570

stack backtrace:
CPU: 0 PID: 7752 Comm: kworker/0:1 Tainted: G        W         5.18.16-arch1-1 #3 2927cbed739f932be66f137e6808a2714da26c25
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40 05/17/2022
Workqueue: pm pm_runtime_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x5f/0x7b
 dev_qdisc_change_real_num_tx+0x68/0x80
 netif_set_real_num_tx_queues+0x8d/0x210
 __igc_open+0x40a/0x660 [igc 73e11f9f5110389b26a5a274cff80c9ddf9bab7a]
 __igc_resume+0x133/0x240 [igc 73e11f9f5110389b26a5a274cff80c9ddf9bab7a]
 ? pci_pme_active+0xa5/0x1a0
 pci_pm_runtime_resume+0xab/0xd0
 ? pci_pm_freeze_noirq+0xf0/0xf0
 __rpm_callback+0x41/0x170
 rpm_callback+0x35/0x70
 ? pci_pm_freeze_noirq+0xf0/0xf0
 rpm_resume+0x5ee/0x820
 pm_runtime_work+0x7c/0xb0
 process_one_work+0x276/0x570
 worker_thread+0x53/0x390
 ? _raw_spin_unlock_irqrestore+0x34/0x60
 ? process_one_work+0x570/0x570
 kthread+0xdb/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset
@ 2022-08-05 11:25                     ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-05 11:25 UTC (permalink / raw)
  To: Vinicius Costa Gomes, intel-wired-lan, Sasha Neftin, Aleksandr Loktionov
  Cc: Paul Menzel, Jesse Brandeburg, netdev

On Thursday, 4 August 2022 23:07:34 BST James Hogan wrote:
> And I just found this patch from December which may have been masked by the
> PTM issues:
> https://lore.kernel.org/netdev/20211201185731.236130-1-vinicius.gomes@intel.
> com/
> 
> I'll build and run with that for a few days and see how it goes.

I gave it a good hammering yesterday evening with suspend/resume cycles, and
it didn't lock up, however it did still fail to bring the network up a couple
of times, requiring me to unload and reload the driver.

The only kernel log splats I saw were an assert that RTNL mutex wasn't taken
in the igc_runtime_resume path, and a suspicious RCU usage warning, both
pasted below.

I'll keep running with that patch and lockdep enabled (based on
5.18.16-arch1-1) and report back any further issues.

Cheers
James

------------[ cut here ]------------
RTNL: assertion failed at net/core/dev.c (2886)
WARNING: CPU: 0 PID: 7752 at net/core/dev.c:2886 netif_set_real_num_tx_queues+0x1f0/0x210
Modules linked in: rfcomm intel_rapl_msr ee1004 spi_nor iTCO_wdt intel_pmc_bxt mtd iTCO_vendor_support mei_pxp mei_hdcp cmac algif_hash algif_skcipher af_alg bnep pmt_telemetry pmt_class wmi_bmof mxm_wmi intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass rapl intel_cstate intel_uncore pcspkr snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_hda_codec_realtek snd_sof_utils snd_soc_hdac_hda snd_hda_codec_generic snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus ledtrig_audio uvcvideo snd_usb_audio snd_soc_core videobuf2_vmalloc videobuf2_memops snd_usbmidi_lib videobuf2_v4l2 snd_compress i2c_i801 snd_rawmidi spi_intel_pci ac97_bus igc(-) spi_intel videobuf2_common snd_pcm_dmaengine i2c_smbus snd_seq_device mei_me snd_hda_codec_hdmi mei cdc_acm videodev snd_hda_intel snd_intel_dspcfg
 snd_intel_sdw_acpi mc amdgpu mousedev i915 snd_hda_codec snd_hda_core btusb snd_hwdep btrtl snd_pcm btbcm gpu_sched drm_buddy joydev btintel snd_timer drm_ttm_helper btmtk snd ttm intel_vsec drm_dp_helper soundcore intel_gtt serial_multi_instantiate wmi video bluetooth ecdh_generic acpi_tad rfkill coretemp acpi_pad nls_iso8859_1 vfat fat mac_hid ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_multiport xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter i2c_dev sg crypto_user fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_microsoft ff_memless dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod usbhid crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd nvme cryptd sr_mod xhci_pci nvme_core cdrom xhci_pci_renesas
CPU: 0 PID: 7752 Comm: kworker/0:1 Not tainted 5.18.16-arch1-1 #3 2927cbed739f932be66f137e6808a2714da26c25
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40 05/17/2022
Workqueue: pm pm_runtime_work
RIP: 0010:netif_set_real_num_tx_queues+0x1f0/0x210
Code: f8 f7 5f 01 00 0f 85 90 fe ff ff ba 46 0b 00 00 48 c7 c6 6e 6e 6f 82 48 c7 c7 70 a5 72 82 c6 05 d8 f7 5f 01 01 e8 1f d6 25 00 <0f> 0b e9 6a fe ff ff b8 ea ff ff ff e9 46 fe ff ff 66 66 2e 0f 1f
RSP: 0018:ffffa25e823dbc98 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff95a5668fa000 RCX: 0000000000000027
RDX: ffff95accfc21a28 RSI: 0000000000000001 RDI: ffff95accfc21a20
RBP: 0000000000000004 R08: 0000000000000000 R09: ffffa25e823dbaa0
R10: 0000000000000003 R11: ffff95acf07ac2e8 R12: 0000000000000001
R13: 0000000000000004 R14: ffff95a5668fa000 R15: ffff95a5691bc1e8
FS:  0000000000000000(0000) GS:ffff95accfc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feec86e5c90 CR3: 0000000252fdc001 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
 <TASK>
 __igc_open+0x40a/0x660 [igc 73e11f9f5110389b26a5a274cff80c9ddf9bab7a]
 __igc_resume+0x133/0x240 [igc 73e11f9f5110389b26a5a274cff80c9ddf9bab7a]
 ? pci_pme_active+0xa5/0x1a0
 pci_pm_runtime_resume+0xab/0xd0
 ? pci_pm_freeze_noirq+0xf0/0xf0
 __rpm_callback+0x41/0x170
 rpm_callback+0x35/0x70
 ? pci_pm_freeze_noirq+0xf0/0xf0
 rpm_resume+0x5ee/0x820
 pm_runtime_work+0x7c/0xb0
 process_one_work+0x276/0x570
 worker_thread+0x53/0x390
 ? _raw_spin_unlock_irqrestore+0x34/0x60
 ? process_one_work+0x570/0x570
 kthread+0xdb/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
irq event stamp: 128847
hardirqs last  enabled at (128853): [<ffffffff81132dde>] __up_console_sem+0x5e/0x70
hardirqs last disabled at (128858): [<ffffffff81132dc3>] __up_console_sem+0x43/0x70
softirqs last  enabled at (125404): [<ffffffff810a5533>] __irq_exit_rcu+0xa3/0xd0
softirqs last disabled at (125395): [<ffffffff810a5533>] __irq_exit_rcu+0xa3/0xd0
---[ end trace 0000000000000000 ]---

=============================
WARNING: suspicious RCU usage
5.18.16-arch1-1 #3 Tainted: G        W   
-----------------------------
net/sched/sch_generic.c:1389 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1 
2 locks held by kworker/0:1/7752:
 #0: ffff95a540ba8b38 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x1f5/0x570
 #1: ffffa25e823dbe78 ((work_completion)(&dev->power.work)){+.+.}-{0:0}, at: process_one_work+0x1f5/0x570

stack backtrace:
CPU: 0 PID: 7752 Comm: kworker/0:1 Tainted: G        W         5.18.16-arch1-1 #3 2927cbed739f932be66f137e6808a2714da26c25
Hardware name: Micro-Star International Co., Ltd. MS-7D25/PRO Z690-A DDR4(MS-7D25), BIOS 1.40 05/17/2022
Workqueue: pm pm_runtime_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x5f/0x7b
 dev_qdisc_change_real_num_tx+0x68/0x80
 netif_set_real_num_tx_queues+0x8d/0x210
 __igc_open+0x40a/0x660 [igc 73e11f9f5110389b26a5a274cff80c9ddf9bab7a]
 __igc_resume+0x133/0x240 [igc 73e11f9f5110389b26a5a274cff80c9ddf9bab7a]
 ? pci_pme_active+0xa5/0x1a0
 pci_pm_runtime_resume+0xab/0xd0
 ? pci_pm_freeze_noirq+0xf0/0xf0
 __rpm_callback+0x41/0x170
 rpm_callback+0x35/0x70
 ? pci_pm_freeze_noirq+0xf0/0xf0
 rpm_resume+0x5ee/0x820
 pm_runtime_work+0x7c/0xb0
 process_one_work+0x276/0x570
 worker_thread+0x53/0x390
 ? _raw_spin_unlock_irqrestore+0x34/0x60
 ? process_one_work+0x570/0x570
 kthread+0xdb/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-05 11:25                     ` James Hogan
@ 2022-08-11 15:13                       ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 38+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-11 15:13 UTC (permalink / raw)
  To: jhogan
  Cc: Paul Menzel, netdev, Jesse Brandeburg, Aleksandr Loktionov,
	intel-wired-lan

It was reported a RTNL deadlock in the igc driver that was causing
problems during suspend/resume.

The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
caused by taking RTNL in RPM resume path").

Reported-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
Hi James,

Thanks to your investigation I found commit ac8c58f5b535, and it looks
like it could solve the issue you are seeing.

Could you please see if this patch helps. It's only compile and boot
tested.

Sorry the delay, I am travelling.

Cheers,


 drivers/net/ethernet/intel/igc/igc_main.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ebff0e04045d..5079dc581d8d 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6600,7 +6600,7 @@ static void igc_deliver_wake_packet(struct net_device *netdev)
 	netif_rx(skb);
 }
 
-static int __maybe_unused igc_resume(struct device *dev)
+static int __maybe_unused __igc_resume(struct device *dev, bool rpm)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -6642,23 +6642,30 @@ static int __maybe_unused igc_resume(struct device *dev)
 
 	wr32(IGC_WUS, ~0);
 
-	rtnl_lock();
+	if (!rpm)
+		rtnl_lock();
 	if (!err && netif_running(netdev))
 		err = __igc_open(netdev, true);
 
 	if (!err)
 		netif_device_attach(netdev);
-	rtnl_unlock();
+	if (!rpm)
+		rtnl_unlock();
 
 	return err;
 }
 
 static int __maybe_unused igc_runtime_resume(struct device *dev)
 {
-	return igc_resume(dev);
+	return __igc_resume(dev, true);
 }
 
-static int __maybe_unused igc_suspend(struct device *dev)
+static int __maybe_unused igc_resume(struct device *dev)
+{
+	return __igc_resume(dev, false);
+}
+
+static int __maybe_unused __igc_suspend(struct device *dev)
 {
 	return __igc_shutdown(to_pci_dev(dev), NULL, 0);
 }
@@ -6719,7 +6726,7 @@ static pci_ers_result_t igc_io_error_detected(struct pci_dev *pdev,
  *  @pdev: Pointer to PCI device
  *
  *  Restart the card from scratch, as if from a cold-boot. Implementation
- *  resembles the first-half of the igc_resume routine.
+ *  resembles the first-half of the __igc_resume routine.
  **/
 static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev)
 {
@@ -6758,7 +6765,7 @@ static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev)
  *
  *  This callback is called when the error recovery driver tells us that
  *  its OK to resume normal operation. Implementation resembles the
- *  second-half of the igc_resume routine.
+ *  second-half of the __igc_resume routine.
  */
 static void igc_io_resume(struct pci_dev *pdev)
 {
-- 
2.37.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-08-11 15:13                       ` Vinicius Costa Gomes
  0 siblings, 0 replies; 38+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-11 15:13 UTC (permalink / raw)
  To: jhogan
  Cc: Vinicius Costa Gomes, Paul Menzel, Tony Nguyen, Jesse Brandeburg,
	netdev, intel-wired-lan, Sasha Neftin, Aleksandr Loktionov

It was reported a RTNL deadlock in the igc driver that was causing
problems during suspend/resume.

The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
caused by taking RTNL in RPM resume path").

Reported-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
Hi James,

Thanks to your investigation I found commit ac8c58f5b535, and it looks
like it could solve the issue you are seeing.

Could you please see if this patch helps. It's only compile and boot
tested.

Sorry the delay, I am travelling.

Cheers,


 drivers/net/ethernet/intel/igc/igc_main.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ebff0e04045d..5079dc581d8d 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6600,7 +6600,7 @@ static void igc_deliver_wake_packet(struct net_device *netdev)
 	netif_rx(skb);
 }
 
-static int __maybe_unused igc_resume(struct device *dev)
+static int __maybe_unused __igc_resume(struct device *dev, bool rpm)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -6642,23 +6642,30 @@ static int __maybe_unused igc_resume(struct device *dev)
 
 	wr32(IGC_WUS, ~0);
 
-	rtnl_lock();
+	if (!rpm)
+		rtnl_lock();
 	if (!err && netif_running(netdev))
 		err = __igc_open(netdev, true);
 
 	if (!err)
 		netif_device_attach(netdev);
-	rtnl_unlock();
+	if (!rpm)
+		rtnl_unlock();
 
 	return err;
 }
 
 static int __maybe_unused igc_runtime_resume(struct device *dev)
 {
-	return igc_resume(dev);
+	return __igc_resume(dev, true);
 }
 
-static int __maybe_unused igc_suspend(struct device *dev)
+static int __maybe_unused igc_resume(struct device *dev)
+{
+	return __igc_resume(dev, false);
+}
+
+static int __maybe_unused __igc_suspend(struct device *dev)
 {
 	return __igc_shutdown(to_pci_dev(dev), NULL, 0);
 }
@@ -6719,7 +6726,7 @@ static pci_ers_result_t igc_io_error_detected(struct pci_dev *pdev,
  *  @pdev: Pointer to PCI device
  *
  *  Restart the card from scratch, as if from a cold-boot. Implementation
- *  resembles the first-half of the igc_resume routine.
+ *  resembles the first-half of the __igc_resume routine.
  **/
 static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev)
 {
@@ -6758,7 +6765,7 @@ static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev)
  *
  *  This callback is called when the error recovery driver tells us that
  *  its OK to resume normal operation. Implementation resembles the
- *  second-half of the igc_resume routine.
+ *  second-half of the __igc_resume routine.
  */
 static void igc_io_resume(struct pci_dev *pdev)
 {
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-11 15:13                       ` Vinicius Costa Gomes
@ 2022-08-11 18:58                         ` kernel test robot
  -1 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-08-11 18:58 UTC (permalink / raw)
  To: Vinicius Costa Gomes, jhogan
  Cc: llvm, kbuild-all, Paul Menzel, netdev, Jesse Brandeburg,
	Aleksandr Loktionov, intel-wired-lan

Hi Vinicius,

I love your patch! Yet something to improve:

[auto build test ERROR on tnguy-next-queue/dev-queue]
[also build test ERROR on linus/master v5.19 next-20220811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Vinicius-Costa-Gomes/igc-fix-deadlock-caused-by-taking-RTNL-in-RPM-resume-path/20220811-232032
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20220812/202208120244.a7CKRiFy-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 5f1c7e2cc5a3c07cbc2412e851a7283c1841f520)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/61ed7ed758f23a10549c5d4fdc82ef9356281cbf
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vinicius-Costa-Gomes/igc-fix-deadlock-caused-by-taking-RTNL-in-RPM-resume-path/20220811-232032
        git checkout 61ed7ed758f23a10549c5d4fdc82ef9356281cbf
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/net/ethernet/intel/igc/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/net/ethernet/intel/igc/igc_main.c:6838:26: error: use of undeclared identifier 'igc_suspend'; did you mean '__igc_suspend'?
           SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
                                   ^~~~~~~~~~~
                                   __igc_suspend
   include/linux/pm.h:343:22: note: expanded from macro 'SET_SYSTEM_SLEEP_PM_OPS'
           SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
                               ^
   include/linux/pm.h:313:26: note: expanded from macro 'SYSTEM_SLEEP_PM_OPS'
           .suspend = pm_sleep_ptr(suspend_fn), \
                                   ^
   include/linux/pm.h:439:65: note: expanded from macro 'pm_sleep_ptr'
   #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr))
                                                                   ^
   include/linux/kernel.h:57:38: note: expanded from macro 'PTR_IF'
   #define PTR_IF(cond, ptr)       ((cond) ? (ptr) : NULL)
                                              ^
   drivers/net/ethernet/intel/igc/igc_main.c:6706:27: note: '__igc_suspend' declared here
   static int __maybe_unused __igc_suspend(struct device *dev)
                             ^
>> drivers/net/ethernet/intel/igc/igc_main.c:6838:26: error: use of undeclared identifier 'igc_suspend'; did you mean '__igc_suspend'?
           SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
                                   ^~~~~~~~~~~
                                   __igc_suspend
   include/linux/pm.h:343:22: note: expanded from macro 'SET_SYSTEM_SLEEP_PM_OPS'
           SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
                               ^
   include/linux/pm.h:315:25: note: expanded from macro 'SYSTEM_SLEEP_PM_OPS'
           .freeze = pm_sleep_ptr(suspend_fn), \
                                  ^
   include/linux/pm.h:439:65: note: expanded from macro 'pm_sleep_ptr'
   #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr))
                                                                   ^
   include/linux/kernel.h:57:38: note: expanded from macro 'PTR_IF'
   #define PTR_IF(cond, ptr)       ((cond) ? (ptr) : NULL)
                                              ^
   drivers/net/ethernet/intel/igc/igc_main.c:6706:27: note: '__igc_suspend' declared here
   static int __maybe_unused __igc_suspend(struct device *dev)
                             ^
>> drivers/net/ethernet/intel/igc/igc_main.c:6838:26: error: use of undeclared identifier 'igc_suspend'; did you mean '__igc_suspend'?
           SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
                                   ^~~~~~~~~~~
                                   __igc_suspend
   include/linux/pm.h:343:22: note: expanded from macro 'SET_SYSTEM_SLEEP_PM_OPS'
           SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
                               ^
   include/linux/pm.h:317:27: note: expanded from macro 'SYSTEM_SLEEP_PM_OPS'
           .poweroff = pm_sleep_ptr(suspend_fn), \
                                    ^
   include/linux/pm.h:439:65: note: expanded from macro 'pm_sleep_ptr'
   #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr))
                                                                   ^
   include/linux/kernel.h:57:38: note: expanded from macro 'PTR_IF'
   #define PTR_IF(cond, ptr)       ((cond) ? (ptr) : NULL)
                                              ^
   drivers/net/ethernet/intel/igc/igc_main.c:6706:27: note: '__igc_suspend' declared here
   static int __maybe_unused __igc_suspend(struct device *dev)
                             ^
   3 errors generated.


vim +6838 drivers/net/ethernet/intel/igc/igc_main.c

bc23aa949aeba0 Sasha Neftin 2020-01-29  6835  
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6836  #ifdef CONFIG_PM
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6837  static const struct dev_pm_ops igc_pm_ops = {
9513d2a5dc7f3f Sasha Neftin 2019-11-14 @6838  	SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6839  	SET_RUNTIME_PM_OPS(igc_runtime_suspend, igc_runtime_resume,
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6840  			   igc_runtime_idle)
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6841  };
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6842  #endif
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6843  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-08-11 18:58                         ` kernel test robot
  0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-08-11 18:58 UTC (permalink / raw)
  To: Vinicius Costa Gomes, jhogan
  Cc: Paul Menzel, kbuild-all, netdev, llvm, Jesse Brandeburg,
	Aleksandr Loktionov, intel-wired-lan

Hi Vinicius,

I love your patch! Yet something to improve:

[auto build test ERROR on tnguy-next-queue/dev-queue]
[also build test ERROR on linus/master v5.19 next-20220811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Vinicius-Costa-Gomes/igc-fix-deadlock-caused-by-taking-RTNL-in-RPM-resume-path/20220811-232032
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
config: i386-randconfig-a013 (https://download.01.org/0day-ci/archive/20220812/202208120244.a7CKRiFy-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 5f1c7e2cc5a3c07cbc2412e851a7283c1841f520)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/61ed7ed758f23a10549c5d4fdc82ef9356281cbf
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vinicius-Costa-Gomes/igc-fix-deadlock-caused-by-taking-RTNL-in-RPM-resume-path/20220811-232032
        git checkout 61ed7ed758f23a10549c5d4fdc82ef9356281cbf
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/net/ethernet/intel/igc/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/net/ethernet/intel/igc/igc_main.c:6838:26: error: use of undeclared identifier 'igc_suspend'; did you mean '__igc_suspend'?
           SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
                                   ^~~~~~~~~~~
                                   __igc_suspend
   include/linux/pm.h:343:22: note: expanded from macro 'SET_SYSTEM_SLEEP_PM_OPS'
           SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
                               ^
   include/linux/pm.h:313:26: note: expanded from macro 'SYSTEM_SLEEP_PM_OPS'
           .suspend = pm_sleep_ptr(suspend_fn), \
                                   ^
   include/linux/pm.h:439:65: note: expanded from macro 'pm_sleep_ptr'
   #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr))
                                                                   ^
   include/linux/kernel.h:57:38: note: expanded from macro 'PTR_IF'
   #define PTR_IF(cond, ptr)       ((cond) ? (ptr) : NULL)
                                              ^
   drivers/net/ethernet/intel/igc/igc_main.c:6706:27: note: '__igc_suspend' declared here
   static int __maybe_unused __igc_suspend(struct device *dev)
                             ^
>> drivers/net/ethernet/intel/igc/igc_main.c:6838:26: error: use of undeclared identifier 'igc_suspend'; did you mean '__igc_suspend'?
           SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
                                   ^~~~~~~~~~~
                                   __igc_suspend
   include/linux/pm.h:343:22: note: expanded from macro 'SET_SYSTEM_SLEEP_PM_OPS'
           SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
                               ^
   include/linux/pm.h:315:25: note: expanded from macro 'SYSTEM_SLEEP_PM_OPS'
           .freeze = pm_sleep_ptr(suspend_fn), \
                                  ^
   include/linux/pm.h:439:65: note: expanded from macro 'pm_sleep_ptr'
   #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr))
                                                                   ^
   include/linux/kernel.h:57:38: note: expanded from macro 'PTR_IF'
   #define PTR_IF(cond, ptr)       ((cond) ? (ptr) : NULL)
                                              ^
   drivers/net/ethernet/intel/igc/igc_main.c:6706:27: note: '__igc_suspend' declared here
   static int __maybe_unused __igc_suspend(struct device *dev)
                             ^
>> drivers/net/ethernet/intel/igc/igc_main.c:6838:26: error: use of undeclared identifier 'igc_suspend'; did you mean '__igc_suspend'?
           SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
                                   ^~~~~~~~~~~
                                   __igc_suspend
   include/linux/pm.h:343:22: note: expanded from macro 'SET_SYSTEM_SLEEP_PM_OPS'
           SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
                               ^
   include/linux/pm.h:317:27: note: expanded from macro 'SYSTEM_SLEEP_PM_OPS'
           .poweroff = pm_sleep_ptr(suspend_fn), \
                                    ^
   include/linux/pm.h:439:65: note: expanded from macro 'pm_sleep_ptr'
   #define pm_sleep_ptr(_ptr) PTR_IF(IS_ENABLED(CONFIG_PM_SLEEP), (_ptr))
                                                                   ^
   include/linux/kernel.h:57:38: note: expanded from macro 'PTR_IF'
   #define PTR_IF(cond, ptr)       ((cond) ? (ptr) : NULL)
                                              ^
   drivers/net/ethernet/intel/igc/igc_main.c:6706:27: note: '__igc_suspend' declared here
   static int __maybe_unused __igc_suspend(struct device *dev)
                             ^
   3 errors generated.


vim +6838 drivers/net/ethernet/intel/igc/igc_main.c

bc23aa949aeba0 Sasha Neftin 2020-01-29  6835  
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6836  #ifdef CONFIG_PM
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6837  static const struct dev_pm_ops igc_pm_ops = {
9513d2a5dc7f3f Sasha Neftin 2019-11-14 @6838  	SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6839  	SET_RUNTIME_PM_OPS(igc_runtime_suspend, igc_runtime_resume,
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6840  			   igc_runtime_idle)
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6841  };
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6842  #endif
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6843  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-11 15:13                       ` Vinicius Costa Gomes
@ 2022-08-11 19:59                         ` kernel test robot
  -1 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-08-11 19:59 UTC (permalink / raw)
  To: Vinicius Costa Gomes, jhogan
  Cc: kbuild-all, Paul Menzel, netdev, Jesse Brandeburg,
	Aleksandr Loktionov, intel-wired-lan

Hi Vinicius,

I love your patch! Yet something to improve:

[auto build test ERROR on tnguy-next-queue/dev-queue]
[also build test ERROR on linus/master v5.19 next-20220811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Vinicius-Costa-Gomes/igc-fix-deadlock-caused-by-taking-RTNL-in-RPM-resume-path/20220811-232032
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20220812/202208120359.pPxeIJNZ-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/61ed7ed758f23a10549c5d4fdc82ef9356281cbf
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vinicius-Costa-Gomes/igc-fix-deadlock-caused-by-taking-RTNL-in-RPM-resume-path/20220811-232032
        git checkout 61ed7ed758f23a10549c5d4fdc82ef9356281cbf
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/percpu.h:27,
                    from arch/x86/include/asm/nospec-branch.h:14,
                    from arch/x86/include/asm/paravirt_types.h:40,
                    from arch/x86/include/asm/ptrace.h:97,
                    from arch/x86/include/asm/math_emu.h:5,
                    from arch/x86/include/asm/processor.h:13,
                    from arch/x86/include/asm/timex.h:5,
                    from include/linux/timex.h:67,
                    from include/linux/time32.h:13,
                    from include/linux/time.h:60,
                    from include/linux/stat.h:19,
                    from include/linux/module.h:13,
                    from drivers/net/ethernet/intel/igc/igc_main.c:4:
>> drivers/net/ethernet/intel/igc/igc_main.c:6838:33: error: 'igc_suspend' undeclared here (not in a function); did you mean 'dpm_suspend'?
    6838 |         SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
         |                                 ^~~~~~~~~~~
   include/linux/kernel.h:57:44: note: in definition of macro 'PTR_IF'
      57 | #define PTR_IF(cond, ptr)       ((cond) ? (ptr) : NULL)
         |                                            ^~~
   include/linux/pm.h:313:20: note: in expansion of macro 'pm_sleep_ptr'
     313 |         .suspend = pm_sleep_ptr(suspend_fn), \
         |                    ^~~~~~~~~~~~
   include/linux/pm.h:343:9: note: in expansion of macro 'SYSTEM_SLEEP_PM_OPS'
     343 |         SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
         |         ^~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6838:9: note: in expansion of macro 'SET_SYSTEM_SLEEP_PM_OPS'
    6838 |         SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
         |         ^~~~~~~~~~~~~~~~~~~~~~~


vim +6838 drivers/net/ethernet/intel/igc/igc_main.c

bc23aa949aeba0 Sasha Neftin 2020-01-29  6835  
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6836  #ifdef CONFIG_PM
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6837  static const struct dev_pm_ops igc_pm_ops = {
9513d2a5dc7f3f Sasha Neftin 2019-11-14 @6838  	SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6839  	SET_RUNTIME_PM_OPS(igc_runtime_suspend, igc_runtime_resume,
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6840  			   igc_runtime_idle)
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6841  };
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6842  #endif
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6843  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-08-11 19:59                         ` kernel test robot
  0 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2022-08-11 19:59 UTC (permalink / raw)
  To: Vinicius Costa Gomes, jhogan
  Cc: Paul Menzel, kbuild-all, netdev, Jesse Brandeburg,
	Aleksandr Loktionov, intel-wired-lan

Hi Vinicius,

I love your patch! Yet something to improve:

[auto build test ERROR on tnguy-next-queue/dev-queue]
[also build test ERROR on linus/master v5.19 next-20220811]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Vinicius-Costa-Gomes/igc-fix-deadlock-caused-by-taking-RTNL-in-RPM-resume-path/20220811-232032
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git dev-queue
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20220812/202208120359.pPxeIJNZ-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/61ed7ed758f23a10549c5d4fdc82ef9356281cbf
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Vinicius-Costa-Gomes/igc-fix-deadlock-caused-by-taking-RTNL-in-RPM-resume-path/20220811-232032
        git checkout 61ed7ed758f23a10549c5d4fdc82ef9356281cbf
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/percpu.h:27,
                    from arch/x86/include/asm/nospec-branch.h:14,
                    from arch/x86/include/asm/paravirt_types.h:40,
                    from arch/x86/include/asm/ptrace.h:97,
                    from arch/x86/include/asm/math_emu.h:5,
                    from arch/x86/include/asm/processor.h:13,
                    from arch/x86/include/asm/timex.h:5,
                    from include/linux/timex.h:67,
                    from include/linux/time32.h:13,
                    from include/linux/time.h:60,
                    from include/linux/stat.h:19,
                    from include/linux/module.h:13,
                    from drivers/net/ethernet/intel/igc/igc_main.c:4:
>> drivers/net/ethernet/intel/igc/igc_main.c:6838:33: error: 'igc_suspend' undeclared here (not in a function); did you mean 'dpm_suspend'?
    6838 |         SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
         |                                 ^~~~~~~~~~~
   include/linux/kernel.h:57:44: note: in definition of macro 'PTR_IF'
      57 | #define PTR_IF(cond, ptr)       ((cond) ? (ptr) : NULL)
         |                                            ^~~
   include/linux/pm.h:313:20: note: in expansion of macro 'pm_sleep_ptr'
     313 |         .suspend = pm_sleep_ptr(suspend_fn), \
         |                    ^~~~~~~~~~~~
   include/linux/pm.h:343:9: note: in expansion of macro 'SYSTEM_SLEEP_PM_OPS'
     343 |         SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn)
         |         ^~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/intel/igc/igc_main.c:6838:9: note: in expansion of macro 'SET_SYSTEM_SLEEP_PM_OPS'
    6838 |         SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
         |         ^~~~~~~~~~~~~~~~~~~~~~~


vim +6838 drivers/net/ethernet/intel/igc/igc_main.c

bc23aa949aeba0 Sasha Neftin 2020-01-29  6835  
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6836  #ifdef CONFIG_PM
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6837  static const struct dev_pm_ops igc_pm_ops = {
9513d2a5dc7f3f Sasha Neftin 2019-11-14 @6838  	SET_SYSTEM_SLEEP_PM_OPS(igc_suspend, igc_resume)
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6839  	SET_RUNTIME_PM_OPS(igc_runtime_suspend, igc_runtime_resume,
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6840  			   igc_runtime_idle)
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6841  };
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6842  #endif
9513d2a5dc7f3f Sasha Neftin 2019-11-14  6843  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-11 15:13                       ` Vinicius Costa Gomes
@ 2022-08-11 20:25                         ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 38+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-11 20:25 UTC (permalink / raw)
  To: jhogan
  Cc: Vinicius Costa Gomes, Paul Menzel, Tony Nguyen, Jesse Brandeburg,
	netdev, intel-wired-lan, Sasha Neftin, Aleksandr Loktionov

It was reported a RTNL deadlock in the igc driver that was causing
problems during suspend/resume.

The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
caused by taking RTNL in RPM resume path").

Reported-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
Sorry for the noise earlier, my kernel config didn't have runtime PM
enabled.


 drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ebff0e04045d..9b0d4becfcfc 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6600,7 +6600,7 @@ static void igc_deliver_wake_packet(struct net_device *netdev)
 	netif_rx(skb);
 }
 
-static int __maybe_unused igc_resume(struct device *dev)
+static int __maybe_unused __igc_resume(struct device *dev, bool rpm)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -6642,20 +6642,27 @@ static int __maybe_unused igc_resume(struct device *dev)
 
 	wr32(IGC_WUS, ~0);
 
-	rtnl_lock();
+	if (!rpm)
+		rtnl_lock();
 	if (!err && netif_running(netdev))
 		err = __igc_open(netdev, true);
 
 	if (!err)
 		netif_device_attach(netdev);
-	rtnl_unlock();
+	if (!rpm)
+		rtnl_unlock();
 
 	return err;
 }
 
 static int __maybe_unused igc_runtime_resume(struct device *dev)
 {
-	return igc_resume(dev);
+	return __igc_resume(dev, true);
+}
+
+static int __maybe_unused igc_resume(struct device *dev)
+{
+	return __igc_resume(dev, false);
 }
 
 static int __maybe_unused igc_suspend(struct device *dev)
@@ -6719,7 +6726,7 @@ static pci_ers_result_t igc_io_error_detected(struct pci_dev *pdev,
  *  @pdev: Pointer to PCI device
  *
  *  Restart the card from scratch, as if from a cold-boot. Implementation
- *  resembles the first-half of the igc_resume routine.
+ *  resembles the first-half of the __igc_resume routine.
  **/
 static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev)
 {
@@ -6758,7 +6765,7 @@ static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev)
  *
  *  This callback is called when the error recovery driver tells us that
  *  its OK to resume normal operation. Implementation resembles the
- *  second-half of the igc_resume routine.
+ *  second-half of the __igc_resume routine.
  */
 static void igc_io_resume(struct pci_dev *pdev)
 {
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-08-11 20:25                         ` Vinicius Costa Gomes
  0 siblings, 0 replies; 38+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-11 20:25 UTC (permalink / raw)
  To: jhogan
  Cc: Paul Menzel, netdev, Jesse Brandeburg, Aleksandr Loktionov,
	intel-wired-lan

It was reported a RTNL deadlock in the igc driver that was causing
problems during suspend/resume.

The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
caused by taking RTNL in RPM resume path").

Reported-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
Sorry for the noise earlier, my kernel config didn't have runtime PM
enabled.


 drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ebff0e04045d..9b0d4becfcfc 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6600,7 +6600,7 @@ static void igc_deliver_wake_packet(struct net_device *netdev)
 	netif_rx(skb);
 }
 
-static int __maybe_unused igc_resume(struct device *dev)
+static int __maybe_unused __igc_resume(struct device *dev, bool rpm)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct net_device *netdev = pci_get_drvdata(pdev);
@@ -6642,20 +6642,27 @@ static int __maybe_unused igc_resume(struct device *dev)
 
 	wr32(IGC_WUS, ~0);
 
-	rtnl_lock();
+	if (!rpm)
+		rtnl_lock();
 	if (!err && netif_running(netdev))
 		err = __igc_open(netdev, true);
 
 	if (!err)
 		netif_device_attach(netdev);
-	rtnl_unlock();
+	if (!rpm)
+		rtnl_unlock();
 
 	return err;
 }
 
 static int __maybe_unused igc_runtime_resume(struct device *dev)
 {
-	return igc_resume(dev);
+	return __igc_resume(dev, true);
+}
+
+static int __maybe_unused igc_resume(struct device *dev)
+{
+	return __igc_resume(dev, false);
 }
 
 static int __maybe_unused igc_suspend(struct device *dev)
@@ -6719,7 +6726,7 @@ static pci_ers_result_t igc_io_error_detected(struct pci_dev *pdev,
  *  @pdev: Pointer to PCI device
  *
  *  Restart the card from scratch, as if from a cold-boot. Implementation
- *  resembles the first-half of the igc_resume routine.
+ *  resembles the first-half of the __igc_resume routine.
  **/
 static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev)
 {
@@ -6758,7 +6765,7 @@ static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev)
  *
  *  This callback is called when the error recovery driver tells us that
  *  its OK to resume normal operation. Implementation resembles the
- *  second-half of the igc_resume routine.
+ *  second-half of the __igc_resume routine.
  */
 static void igc_io_resume(struct pci_dev *pdev)
 {
-- 
2.37.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-11 20:25                         ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-08-11 21:41                           ` James Hogan
  -1 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-11 21:41 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Vinicius Costa Gomes, Paul Menzel, Tony Nguyen, Jesse Brandeburg,
	netdev, intel-wired-lan, Sasha Neftin, Aleksandr Loktionov

On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> It was reported a RTNL deadlock in the igc driver that was causing
> problems during suspend/resume.
> 
> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> caused by taking RTNL in RPM resume path").
> 
> Reported-by: James Hogan <jhogan@kernel.org>
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
> Sorry for the noise earlier, my kernel config didn't have runtime PM
> enabled.

Thanks for looking into this.

This is identical to the patch I've been running for the last week. The 
deadlock is avoided, however I now occasionally see an assertion from 
netif_set_real_num_tx_queues due to the lock not being taken in some cases via 
the runtime_resume path, and a suspicious rcu_dereference_protected() warning 
(presumably due to the same issue of the lock not being taken). See here for 
details:
https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/

Cheers
James

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-08-11 21:41                           ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-11 21:41 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, netdev, Jesse Brandeburg, Aleksandr Loktionov,
	intel-wired-lan

On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> It was reported a RTNL deadlock in the igc driver that was causing
> problems during suspend/resume.
> 
> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> caused by taking RTNL in RPM resume path").
> 
> Reported-by: James Hogan <jhogan@kernel.org>
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> ---
> Sorry for the noise earlier, my kernel config didn't have runtime PM
> enabled.

Thanks for looking into this.

This is identical to the patch I've been running for the last week. The 
deadlock is avoided, however I now occasionally see an assertion from 
netif_set_real_num_tx_queues due to the lock not being taken in some cases via 
the runtime_resume path, and a suspicious rcu_dereference_protected() warning 
(presumably due to the same issue of the lock not being taken). See here for 
details:
https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/

Cheers
James

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-11 21:41                           ` [Intel-wired-lan] " James Hogan
@ 2022-08-13  0:05                             ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 38+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-13  0:05 UTC (permalink / raw)
  To: James Hogan
  Cc: Paul Menzel, Tony Nguyen, Jesse Brandeburg, netdev,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov

Hi James,

James Hogan <jhogan@kernel.org> writes:

> On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
>> It was reported a RTNL deadlock in the igc driver that was causing
>> problems during suspend/resume.
>> 
>> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
>> caused by taking RTNL in RPM resume path").
>> 
>> Reported-by: James Hogan <jhogan@kernel.org>
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>> Sorry for the noise earlier, my kernel config didn't have runtime PM
>> enabled.
>
> Thanks for looking into this.
>
> This is identical to the patch I've been running for the last week. The 
> deadlock is avoided, however I now occasionally see an assertion from 
> netif_set_real_num_tx_queues due to the lock not being taken in some cases via 
> the runtime_resume path, and a suspicious rcu_dereference_protected() warning 
> (presumably due to the same issue of the lock not being taken). See here for 
> details:
> https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/

Oh, sorry. I missed the part that the rtnl assert splat was already
using similar/identical code to what I got/copied from igb.

So what this seems to be telling us is that the "fix" from igb is only
hiding the issue, and we would need to remove the need for taking the
RTNL for the suspend/resume paths in igc and igb? (as someone else said
in that igb thread, iirc)


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-08-13  0:05                             ` Vinicius Costa Gomes
  0 siblings, 0 replies; 38+ messages in thread
From: Vinicius Costa Gomes @ 2022-08-13  0:05 UTC (permalink / raw)
  To: James Hogan
  Cc: Paul Menzel, netdev, Jesse Brandeburg, Aleksandr Loktionov,
	intel-wired-lan

Hi James,

James Hogan <jhogan@kernel.org> writes:

> On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
>> It was reported a RTNL deadlock in the igc driver that was causing
>> problems during suspend/resume.
>> 
>> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
>> caused by taking RTNL in RPM resume path").
>> 
>> Reported-by: James Hogan <jhogan@kernel.org>
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>> Sorry for the noise earlier, my kernel config didn't have runtime PM
>> enabled.
>
> Thanks for looking into this.
>
> This is identical to the patch I've been running for the last week. The 
> deadlock is avoided, however I now occasionally see an assertion from 
> netif_set_real_num_tx_queues due to the lock not being taken in some cases via 
> the runtime_resume path, and a suspicious rcu_dereference_protected() warning 
> (presumably due to the same issue of the lock not being taken). See here for 
> details:
> https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/

Oh, sorry. I missed the part that the rtnl assert splat was already
using similar/identical code to what I got/copied from igb.

So what this seems to be telling us is that the "fix" from igb is only
hiding the issue, and we would need to remove the need for taking the
RTNL for the suspend/resume paths in igc and igb? (as someone else said
in that igb thread, iirc)


Cheers,
-- 
Vinicius
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-13  0:05                             ` [Intel-wired-lan] " Vinicius Costa Gomes
@ 2022-08-13 17:18                               ` James Hogan
  -1 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-13 17:18 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, Tony Nguyen, Jesse Brandeburg, netdev,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov

On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> James Hogan <jhogan@kernel.org> writes:
> > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> >> It was reported a RTNL deadlock in the igc driver that was causing
> >> problems during suspend/resume.
> >> 
> >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> >> caused by taking RTNL in RPM resume path").
> >> 
> >> Reported-by: James Hogan <jhogan@kernel.org>
> >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> >> ---
> >> Sorry for the noise earlier, my kernel config didn't have runtime PM
> >> enabled.
> > 
> > Thanks for looking into this.
> > 
> > This is identical to the patch I've been running for the last week. The
> > deadlock is avoided, however I now occasionally see an assertion from
> > netif_set_real_num_tx_queues due to the lock not being taken in some cases
> > via the runtime_resume path, and a suspicious rcu_dereference_protected()
> > warning (presumably due to the same issue of the lock not being taken).
> > See here for details:
> > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> 
> Oh, sorry. I missed the part that the rtnl assert splat was already
> using similar/identical code to what I got/copied from igb.
> 
> So what this seems to be telling us is that the "fix" from igb is only
> hiding the issue,

I suppose the patch just changes the assumption from "lock will never be held 
on runtime resume path" (incorrect, deadlock) to "lock will always be held on 
runtime resume path" (also incorrect, probably racy).

> and we would need to remove the need for taking the
> RTNL for the suspend/resume paths in igc and igb? (as someone else said
> in that igb thread, iirc)

(I'll defer to others on this. I'm pretty unfamiliar with networking code and 
this particular lock.)

Cheers
James



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-08-13 17:18                               ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-13 17:18 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, netdev, Jesse Brandeburg, Aleksandr Loktionov,
	intel-wired-lan

On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> James Hogan <jhogan@kernel.org> writes:
> > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> >> It was reported a RTNL deadlock in the igc driver that was causing
> >> problems during suspend/resume.
> >> 
> >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> >> caused by taking RTNL in RPM resume path").
> >> 
> >> Reported-by: James Hogan <jhogan@kernel.org>
> >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> >> ---
> >> Sorry for the noise earlier, my kernel config didn't have runtime PM
> >> enabled.
> > 
> > Thanks for looking into this.
> > 
> > This is identical to the patch I've been running for the last week. The
> > deadlock is avoided, however I now occasionally see an assertion from
> > netif_set_real_num_tx_queues due to the lock not being taken in some cases
> > via the runtime_resume path, and a suspicious rcu_dereference_protected()
> > warning (presumably due to the same issue of the lock not being taken).
> > See here for details:
> > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> 
> Oh, sorry. I missed the part that the rtnl assert splat was already
> using similar/identical code to what I got/copied from igb.
> 
> So what this seems to be telling us is that the "fix" from igb is only
> hiding the issue,

I suppose the patch just changes the assumption from "lock will never be held 
on runtime resume path" (incorrect, deadlock) to "lock will always be held on 
runtime resume path" (also incorrect, probably racy).

> and we would need to remove the need for taking the
> RTNL for the suspend/resume paths in igc and igb? (as someone else said
> in that igb thread, iirc)

(I'll defer to others on this. I'm pretty unfamiliar with networking code and 
this particular lock.)

Cheers
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-13 17:18                               ` [Intel-wired-lan] " James Hogan
@ 2022-08-29  8:16                                 ` James Hogan
  -1 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-29  8:16 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, Tony Nguyen, Jesse Brandeburg, netdev,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov

On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
> On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> > James Hogan <jhogan@kernel.org> writes:
> > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> > >> It was reported a RTNL deadlock in the igc driver that was causing
> > >> problems during suspend/resume.
> > >> 
> > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> > >> caused by taking RTNL in RPM resume path").
> > >> 
> > >> Reported-by: James Hogan <jhogan@kernel.org>
> > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > >> ---
> > >> Sorry for the noise earlier, my kernel config didn't have runtime PM
> > >> enabled.
> > > 
> > > Thanks for looking into this.
> > > 
> > > This is identical to the patch I've been running for the last week. The
> > > deadlock is avoided, however I now occasionally see an assertion from
> > > netif_set_real_num_tx_queues due to the lock not being taken in some
> > > cases
> > > via the runtime_resume path, and a suspicious
> > > rcu_dereference_protected()
> > > warning (presumably due to the same issue of the lock not being taken).
> > > See here for details:
> > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> > 
> > Oh, sorry. I missed the part that the rtnl assert splat was already
> > using similar/identical code to what I got/copied from igb.
> > 
> > So what this seems to be telling us is that the "fix" from igb is only
> > hiding the issue,
> 
> I suppose the patch just changes the assumption from "lock will never be
> held on runtime resume path" (incorrect, deadlock) to "lock will always be
> held on runtime resume path" (also incorrect, probably racy).
> 
> > and we would need to remove the need for taking the
> > RTNL for the suspend/resume paths in igc and igb? (as someone else said
> > in that igb thread, iirc)
> 
> (I'll defer to others on this. I'm pretty unfamiliar with networking code
> and this particular lock.)

I'd be great to have this longstanding issue properly fixed rather than having 
to carry a patch locally that may not be lock safe.

Also, any tips for diagnosing the issue of the network link not coming back up 
after resume? I sometimes have to unload and reload the driver module to get 
it back again.

Cheers
James



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-08-29  8:16                                 ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-08-29  8:16 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, netdev, Aleksandr Loktionov, intel-wired-lan

On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
> On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> > James Hogan <jhogan@kernel.org> writes:
> > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> > >> It was reported a RTNL deadlock in the igc driver that was causing
> > >> problems during suspend/resume.
> > >> 
> > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> > >> caused by taking RTNL in RPM resume path").
> > >> 
> > >> Reported-by: James Hogan <jhogan@kernel.org>
> > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > >> ---
> > >> Sorry for the noise earlier, my kernel config didn't have runtime PM
> > >> enabled.
> > > 
> > > Thanks for looking into this.
> > > 
> > > This is identical to the patch I've been running for the last week. The
> > > deadlock is avoided, however I now occasionally see an assertion from
> > > netif_set_real_num_tx_queues due to the lock not being taken in some
> > > cases
> > > via the runtime_resume path, and a suspicious
> > > rcu_dereference_protected()
> > > warning (presumably due to the same issue of the lock not being taken).
> > > See here for details:
> > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> > 
> > Oh, sorry. I missed the part that the rtnl assert splat was already
> > using similar/identical code to what I got/copied from igb.
> > 
> > So what this seems to be telling us is that the "fix" from igb is only
> > hiding the issue,
> 
> I suppose the patch just changes the assumption from "lock will never be
> held on runtime resume path" (incorrect, deadlock) to "lock will always be
> held on runtime resume path" (also incorrect, probably racy).
> 
> > and we would need to remove the need for taking the
> > RTNL for the suspend/resume paths in igc and igb? (as someone else said
> > in that igb thread, iirc)
> 
> (I'll defer to others on this. I'm pretty unfamiliar with networking code
> and this particular lock.)

I'd be great to have this longstanding issue properly fixed rather than having 
to carry a patch locally that may not be lock safe.

Also, any tips for diagnosing the issue of the network link not coming back up 
after resume? I sometimes have to unload and reload the driver module to get 
it back again.

Cheers
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-08-29  8:16                                 ` [Intel-wired-lan] " James Hogan
@ 2022-10-02 10:56                                   ` James Hogan
  -1 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-10-02 10:56 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, Tony Nguyen, Jesse Brandeburg, netdev,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov

On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
> On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
> > On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> > > James Hogan <jhogan@kernel.org> writes:
> > > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> > > >> It was reported a RTNL deadlock in the igc driver that was causing
> > > >> problems during suspend/resume.
> > > >> 
> > > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> > > >> caused by taking RTNL in RPM resume path").
> > > >> 
> > > >> Reported-by: James Hogan <jhogan@kernel.org>
> > > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > > >> ---
> > > >> Sorry for the noise earlier, my kernel config didn't have runtime PM
> > > >> enabled.
> > > > 
> > > > Thanks for looking into this.
> > > > 
> > > > This is identical to the patch I've been running for the last week.
> > > > The
> > > > deadlock is avoided, however I now occasionally see an assertion from
> > > > netif_set_real_num_tx_queues due to the lock not being taken in some
> > > > cases
> > > > via the runtime_resume path, and a suspicious
> > > > rcu_dereference_protected()
> > > > warning (presumably due to the same issue of the lock not being
> > > > taken).
> > > > See here for details:
> > > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> > > 
> > > Oh, sorry. I missed the part that the rtnl assert splat was already
> > > using similar/identical code to what I got/copied from igb.
> > > 
> > > So what this seems to be telling us is that the "fix" from igb is only
> > > hiding the issue,
> > 
> > I suppose the patch just changes the assumption from "lock will never be
> > held on runtime resume path" (incorrect, deadlock) to "lock will always be
> > held on runtime resume path" (also incorrect, probably racy).
> > 
> > > and we would need to remove the need for taking the
> > > RTNL for the suspend/resume paths in igc and igb? (as someone else said
> > > in that igb thread, iirc)
> > 
> > (I'll defer to others on this. I'm pretty unfamiliar with networking code
> > and this particular lock.)
> 
> I'd be great to have this longstanding issue properly fixed rather than
> having to carry a patch locally that may not be lock safe.
> 
> Also, any tips for diagnosing the issue of the network link not coming back
> up after resume? I sometimes have to unload and reload the driver module to
> get it back again.

Any thoughts on this from anybody?

Cheers
James



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2022-10-02 10:56                                   ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2022-10-02 10:56 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, netdev, Aleksandr Loktionov, intel-wired-lan

On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
> On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
> > On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> > > James Hogan <jhogan@kernel.org> writes:
> > > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> > > >> It was reported a RTNL deadlock in the igc driver that was causing
> > > >> problems during suspend/resume.
> > > >> 
> > > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> > > >> caused by taking RTNL in RPM resume path").
> > > >> 
> > > >> Reported-by: James Hogan <jhogan@kernel.org>
> > > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > > >> ---
> > > >> Sorry for the noise earlier, my kernel config didn't have runtime PM
> > > >> enabled.
> > > > 
> > > > Thanks for looking into this.
> > > > 
> > > > This is identical to the patch I've been running for the last week.
> > > > The
> > > > deadlock is avoided, however I now occasionally see an assertion from
> > > > netif_set_real_num_tx_queues due to the lock not being taken in some
> > > > cases
> > > > via the runtime_resume path, and a suspicious
> > > > rcu_dereference_protected()
> > > > warning (presumably due to the same issue of the lock not being
> > > > taken).
> > > > See here for details:
> > > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> > > 
> > > Oh, sorry. I missed the part that the rtnl assert splat was already
> > > using similar/identical code to what I got/copied from igb.
> > > 
> > > So what this seems to be telling us is that the "fix" from igb is only
> > > hiding the issue,
> > 
> > I suppose the patch just changes the assumption from "lock will never be
> > held on runtime resume path" (incorrect, deadlock) to "lock will always be
> > held on runtime resume path" (also incorrect, probably racy).
> > 
> > > and we would need to remove the need for taking the
> > > RTNL for the suspend/resume paths in igc and igb? (as someone else said
> > > in that igb thread, iirc)
> > 
> > (I'll defer to others on this. I'm pretty unfamiliar with networking code
> > and this particular lock.)
> 
> I'd be great to have this longstanding issue properly fixed rather than
> having to carry a patch locally that may not be lock safe.
> 
> Also, any tips for diagnosing the issue of the network link not coming back
> up after resume? I sometimes have to unload and reload the driver module to
> get it back again.

Any thoughts on this from anybody?

Cheers
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2022-10-02 10:56                                   ` [Intel-wired-lan] " James Hogan
@ 2023-08-14 11:04                                     ` James Hogan
  -1 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2023-08-14 11:04 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, Tony Nguyen, Jesse Brandeburg, netdev,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov

On Sunday, 2 October 2022 11:56:28 BST James Hogan wrote:
> On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
> > On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
> > > On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> > > > James Hogan <jhogan@kernel.org> writes:
> > > > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> > > > >> It was reported a RTNL deadlock in the igc driver that was causing
> > > > >> problems during suspend/resume.
> > > > >> 
> > > > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> > > > >> caused by taking RTNL in RPM resume path").
> > > > >> 
> > > > >> Reported-by: James Hogan <jhogan@kernel.org>
> > > > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > > > >> ---
> > > > >> Sorry for the noise earlier, my kernel config didn't have runtime
> > > > >> PM
> > > > >> enabled.
> > > > > 
> > > > > Thanks for looking into this.
> > > > > 
> > > > > This is identical to the patch I've been running for the last week.
> > > > > The
> > > > > deadlock is avoided, however I now occasionally see an assertion
> > > > > from
> > > > > netif_set_real_num_tx_queues due to the lock not being taken in some
> > > > > cases
> > > > > via the runtime_resume path, and a suspicious
> > > > > rcu_dereference_protected()
> > > > > warning (presumably due to the same issue of the lock not being
> > > > > taken).
> > > > > See here for details:
> > > > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> > > > 
> > > > Oh, sorry. I missed the part that the rtnl assert splat was already
> > > > using similar/identical code to what I got/copied from igb.
> > > > 
> > > > So what this seems to be telling us is that the "fix" from igb is only
> > > > hiding the issue,
> > > 
> > > I suppose the patch just changes the assumption from "lock will never be
> > > held on runtime resume path" (incorrect, deadlock) to "lock will always
> > > be
> > > held on runtime resume path" (also incorrect, probably racy).
> > > 
> > > > and we would need to remove the need for taking the
> > > > RTNL for the suspend/resume paths in igc and igb? (as someone else
> > > > said
> > > > in that igb thread, iirc)
> > > 
> > > (I'll defer to others on this. I'm pretty unfamiliar with networking
> > > code
> > > and this particular lock.)
> > 
> > I'd be great to have this longstanding issue properly fixed rather than
> > having to carry a patch locally that may not be lock safe.
> > 
> > Also, any tips for diagnosing the issue of the network link not coming
> > back
> > up after resume? I sometimes have to unload and reload the driver module
> > to
> > get it back again.
> 
> Any thoughts on this from anybody?

Ping... I've been carrying this patch locally on archlinux for almost a year 
now. Every time I update my kernel and forget to rebuild with the patch it 
catches me out with deadlocks after resume, and even with the patch I 
frequently have to reload the igc module after resume to get the network to 
come up (which is preferable to deadlocks but still really sucks). I'd really 
appreciate if it could get some attention.

Many thanks
James



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2023-08-14 11:04                                     ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2023-08-14 11:04 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, netdev, Jesse Brandeburg, Aleksandr Loktionov,
	intel-wired-lan, Tony Nguyen

On Sunday, 2 October 2022 11:56:28 BST James Hogan wrote:
> On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
> > On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
> > > On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
> > > > James Hogan <jhogan@kernel.org> writes:
> > > > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
> > > > >> It was reported a RTNL deadlock in the igc driver that was causing
> > > > >> problems during suspend/resume.
> > > > >> 
> > > > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
> > > > >> caused by taking RTNL in RPM resume path").
> > > > >> 
> > > > >> Reported-by: James Hogan <jhogan@kernel.org>
> > > > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> > > > >> ---
> > > > >> Sorry for the noise earlier, my kernel config didn't have runtime
> > > > >> PM
> > > > >> enabled.
> > > > > 
> > > > > Thanks for looking into this.
> > > > > 
> > > > > This is identical to the patch I've been running for the last week.
> > > > > The
> > > > > deadlock is avoided, however I now occasionally see an assertion
> > > > > from
> > > > > netif_set_real_num_tx_queues due to the lock not being taken in some
> > > > > cases
> > > > > via the runtime_resume path, and a suspicious
> > > > > rcu_dereference_protected()
> > > > > warning (presumably due to the same issue of the lock not being
> > > > > taken).
> > > > > See here for details:
> > > > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
> > > > 
> > > > Oh, sorry. I missed the part that the rtnl assert splat was already
> > > > using similar/identical code to what I got/copied from igb.
> > > > 
> > > > So what this seems to be telling us is that the "fix" from igb is only
> > > > hiding the issue,
> > > 
> > > I suppose the patch just changes the assumption from "lock will never be
> > > held on runtime resume path" (incorrect, deadlock) to "lock will always
> > > be
> > > held on runtime resume path" (also incorrect, probably racy).
> > > 
> > > > and we would need to remove the need for taking the
> > > > RTNL for the suspend/resume paths in igc and igb? (as someone else
> > > > said
> > > > in that igb thread, iirc)
> > > 
> > > (I'll defer to others on this. I'm pretty unfamiliar with networking
> > > code
> > > and this particular lock.)
> > 
> > I'd be great to have this longstanding issue properly fixed rather than
> > having to carry a patch locally that may not be lock safe.
> > 
> > Also, any tips for diagnosing the issue of the network link not coming
> > back
> > up after resume? I sometimes have to unload and reload the driver module
> > to
> > get it back again.
> 
> Any thoughts on this from anybody?

Ping... I've been carrying this patch locally on archlinux for almost a year 
now. Every time I update my kernel and forget to rebuild with the patch it 
catches me out with deadlocks after resume, and even with the patch I 
frequently have to reload the igc module after resume to get the network to 
come up (which is preferable to deadlocks but still really sucks). I'd really 
appreciate if it could get some attention.

Many thanks
James


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2023-08-14 11:04                                     ` [Intel-wired-lan] " James Hogan
@ 2023-08-29  1:58                                       ` Vinicius Costa Gomes
  -1 siblings, 0 replies; 38+ messages in thread
From: Vinicius Costa Gomes @ 2023-08-29  1:58 UTC (permalink / raw)
  To: James Hogan
  Cc: Paul Menzel, netdev, Jesse Brandeburg, Aleksandr Loktionov,
	intel-wired-lan, Tony Nguyen

Hi James,

James Hogan <jhogan@kernel.org> writes:

> On Sunday, 2 October 2022 11:56:28 BST James Hogan wrote:
>> On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
>> > On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
>> > > On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
>> > > > James Hogan <jhogan@kernel.org> writes:
>> > > > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
>> > > > >> It was reported a RTNL deadlock in the igc driver that was causing
>> > > > >> problems during suspend/resume.
>> > > > >> 
>> > > > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
>> > > > >> caused by taking RTNL in RPM resume path").
>> > > > >> 
>> > > > >> Reported-by: James Hogan <jhogan@kernel.org>
>> > > > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> > > > >> ---
>> > > > >> Sorry for the noise earlier, my kernel config didn't have runtime
>> > > > >> PM
>> > > > >> enabled.
>> > > > > 
>> > > > > Thanks for looking into this.
>> > > > > 
>> > > > > This is identical to the patch I've been running for the last week.
>> > > > > The
>> > > > > deadlock is avoided, however I now occasionally see an assertion
>> > > > > from
>> > > > > netif_set_real_num_tx_queues due to the lock not being taken in some
>> > > > > cases
>> > > > > via the runtime_resume path, and a suspicious
>> > > > > rcu_dereference_protected()
>> > > > > warning (presumably due to the same issue of the lock not being
>> > > > > taken).
>> > > > > See here for details:
>> > > > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
>> > > > 
>> > > > Oh, sorry. I missed the part that the rtnl assert splat was already
>> > > > using similar/identical code to what I got/copied from igb.
>> > > > 
>> > > > So what this seems to be telling us is that the "fix" from igb is only
>> > > > hiding the issue,
>> > > 
>> > > I suppose the patch just changes the assumption from "lock will never be
>> > > held on runtime resume path" (incorrect, deadlock) to "lock will always
>> > > be
>> > > held on runtime resume path" (also incorrect, probably racy).
>> > > 
>> > > > and we would need to remove the need for taking the
>> > > > RTNL for the suspend/resume paths in igc and igb? (as someone else
>> > > > said
>> > > > in that igb thread, iirc)
>> > > 
>> > > (I'll defer to others on this. I'm pretty unfamiliar with networking
>> > > code
>> > > and this particular lock.)
>> > 
>> > I'd be great to have this longstanding issue properly fixed rather than
>> > having to carry a patch locally that may not be lock safe.
>> > 
>> > Also, any tips for diagnosing the issue of the network link not coming
>> > back
>> > up after resume? I sometimes have to unload and reload the driver module
>> > to
>> > get it back again.
>> 
>> Any thoughts on this from anybody?
>
> Ping... I've been carrying this patch locally on archlinux for almost a year 
> now. Every time I update my kernel and forget to rebuild with the patch it 
> catches me out with deadlocks after resume, and even with the patch I 
> frequently have to reload the igc module after resume to get the network to 
> come up (which is preferable to deadlocks but still really sucks). I'd really 
> appreciate if it could get some attention.

I am setting up my test systems to reproduce the deadlocks, then let's
see what ideas happen about removing the need for those locks.

About the link failures, are there any error messages in the kernel
logs? (also, if you could share those logs, can be off-list, it would
help) I am trying to think what could be happening, and how to further
debug this.


Cheers,
-- 
Vinicius
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2023-08-29  1:58                                       ` Vinicius Costa Gomes
  0 siblings, 0 replies; 38+ messages in thread
From: Vinicius Costa Gomes @ 2023-08-29  1:58 UTC (permalink / raw)
  To: James Hogan
  Cc: Paul Menzel, Tony Nguyen, Jesse Brandeburg, netdev,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov, Neftin,
	Sasha

Hi James,

James Hogan <jhogan@kernel.org> writes:

> On Sunday, 2 October 2022 11:56:28 BST James Hogan wrote:
>> On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
>> > On Saturday, 13 August 2022 18:18:25 BST James Hogan wrote:
>> > > On Saturday, 13 August 2022 01:05:41 BST Vinicius Costa Gomes wrote:
>> > > > James Hogan <jhogan@kernel.org> writes:
>> > > > > On Thursday, 11 August 2022 21:25:24 BST Vinicius Costa Gomes wrote:
>> > > > >> It was reported a RTNL deadlock in the igc driver that was causing
>> > > > >> problems during suspend/resume.
>> > > > >> 
>> > > > >> The solution is similar to commit ac8c58f5b535 ("igb: fix deadlock
>> > > > >> caused by taking RTNL in RPM resume path").
>> > > > >> 
>> > > > >> Reported-by: James Hogan <jhogan@kernel.org>
>> > > > >> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> > > > >> ---
>> > > > >> Sorry for the noise earlier, my kernel config didn't have runtime
>> > > > >> PM
>> > > > >> enabled.
>> > > > > 
>> > > > > Thanks for looking into this.
>> > > > > 
>> > > > > This is identical to the patch I've been running for the last week.
>> > > > > The
>> > > > > deadlock is avoided, however I now occasionally see an assertion
>> > > > > from
>> > > > > netif_set_real_num_tx_queues due to the lock not being taken in some
>> > > > > cases
>> > > > > via the runtime_resume path, and a suspicious
>> > > > > rcu_dereference_protected()
>> > > > > warning (presumably due to the same issue of the lock not being
>> > > > > taken).
>> > > > > See here for details:
>> > > > > https://lore.kernel.org/netdev/4765029.31r3eYUQgx@saruman/
>> > > > 
>> > > > Oh, sorry. I missed the part that the rtnl assert splat was already
>> > > > using similar/identical code to what I got/copied from igb.
>> > > > 
>> > > > So what this seems to be telling us is that the "fix" from igb is only
>> > > > hiding the issue,
>> > > 
>> > > I suppose the patch just changes the assumption from "lock will never be
>> > > held on runtime resume path" (incorrect, deadlock) to "lock will always
>> > > be
>> > > held on runtime resume path" (also incorrect, probably racy).
>> > > 
>> > > > and we would need to remove the need for taking the
>> > > > RTNL for the suspend/resume paths in igc and igb? (as someone else
>> > > > said
>> > > > in that igb thread, iirc)
>> > > 
>> > > (I'll defer to others on this. I'm pretty unfamiliar with networking
>> > > code
>> > > and this particular lock.)
>> > 
>> > I'd be great to have this longstanding issue properly fixed rather than
>> > having to carry a patch locally that may not be lock safe.
>> > 
>> > Also, any tips for diagnosing the issue of the network link not coming
>> > back
>> > up after resume? I sometimes have to unload and reload the driver module
>> > to
>> > get it back again.
>> 
>> Any thoughts on this from anybody?
>
> Ping... I've been carrying this patch locally on archlinux for almost a year 
> now. Every time I update my kernel and forget to rebuild with the patch it 
> catches me out with deadlocks after resume, and even with the patch I 
> frequently have to reload the igc module after resume to get the network to 
> come up (which is preferable to deadlocks but still really sucks). I'd really 
> appreciate if it could get some attention.

I am setting up my test systems to reproduce the deadlocks, then let's
see what ideas happen about removing the need for those locks.

About the link failures, are there any error messages in the kernel
logs? (also, if you could share those logs, can be off-list, it would
help) I am trying to think what could be happening, and how to further
debug this.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
  2023-08-29  1:58                                       ` Vinicius Costa Gomes
@ 2023-09-03 17:57                                         ` James Hogan
  -1 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2023-09-03 17:57 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, Tony Nguyen, Jesse Brandeburg, netdev,
	intel-wired-lan, Sasha Neftin, Aleksandr Loktionov, Neftin,
	Sasha

On Tuesday, 29 August 2023 02:58:42 BST Vinicius Costa Gomes wrote:
> James Hogan <jhogan@kernel.org> writes:
> > On Sunday, 2 October 2022 11:56:28 BST James Hogan wrote:
> >> On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
> >> > I'd be great to have this longstanding issue properly fixed rather than
> >> > having to carry a patch locally that may not be lock safe.
> >> > 
> >> > Also, any tips for diagnosing the issue of the network link not coming
> >> > back
> >> > up after resume? I sometimes have to unload and reload the driver
> >> > module
> >> > to
> >> > get it back again.
> >> 
> >> Any thoughts on this from anybody?
> > 
> > Ping... I've been carrying this patch locally on archlinux for almost a
> > year now. Every time I update my kernel and forget to rebuild with the
> > patch it catches me out with deadlocks after resume, and even with the
> > patch I frequently have to reload the igc module after resume to get the
> > network to come up (which is preferable to deadlocks but still really
> > sucks). I'd really appreciate if it could get some attention.
> 
> I am setting up my test systems to reproduce the deadlocks, then let's
> see what ideas happen about removing the need for those locks.
> 
> About the link failures, are there any error messages in the kernel
> logs? (also, if you could share those logs, can be off-list, it would
> help) I am trying to think what could be happening, and how to further
> debug this.

Looking through the resume log, the only network/igc related items are these:

Sep 03 18:28:17 saruman kernel: igc 0000:06:00.0 enp6s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.7180] manager: sleep: wake requested (sleeping: yes  enabled: yes)
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.7181] device (enp6s0): state change: activated -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Sep 03 18:28:17 saruman avahi-daemon[989]: Withdrawing address record for 192.168.1.239 on enp6s0.
Sep 03 18:28:17 saruman avahi-daemon[989]: Leaving mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:17 saruman avahi-daemon[989]: Interface enp6s0.IPv4 no longer relevant for mDNS.
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8202] manager: NetworkManager state is now CONNECTED_GLOBAL
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8657] manager: NetworkManager state is now DISCONNECTED
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8660] device (enp6s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Sep 03 18:28:17 saruman systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 03 18:28:17 saruman systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3075] device (enp6s0): carrier: link connected
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3076] device (enp6s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3080] policy: auto-activating connection 'Wired connection 1' (f6634f16-77ca-34f7-846a-8c41e15a8ad1)
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3082] device (enp6s0): Activation: starting connection 'Wired connection 1' (f6634f16-77ca-34f7-846a-8c41e15a8ad1)
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3082] device (enp6s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3083] manager: NetworkManager state is now CONNECTING
Sep 03 18:28:21 saruman kernel: igc 0000:06:00.0 enp6s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Sep 03 18:28:21 saruman kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp6s0: link becomes ready
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3506] device (enp6s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3512] device (enp6s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3515] policy: set 'Wired connection 1' (enp6s0) as default for IPv4 routing and DNS
Sep 03 18:28:21 saruman avahi-daemon[989]: Joining mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:21 saruman avahi-daemon[989]: New relevant interface enp6s0.IPv4 for mDNS.
Sep 03 18:28:21 saruman avahi-daemon[989]: Registering new address record for 192.168.1.239 on enp6s0.IPv4.
Sep 03 18:28:22 saruman systemd[1]: systemd-rfkill.service: Deactivated successfully.
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3544] device (enp6s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3553] device (enp6s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3554] device (enp6s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3555] manager: NetworkManager state is now CONNECTED_SITE
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3556] device (enp6s0): Activation: successful, device activated.
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.3532] device (enp6s0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Sep 03 18:28:27 saruman avahi-daemon[989]: Withdrawing address record for 192.168.1.239 on enp6s0.
Sep 03 18:28:27 saruman avahi-daemon[989]: Leaving mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:27 saruman avahi-daemon[989]: Interface enp6s0.IPv4 no longer relevant for mDNS.
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.5266] manager: NetworkManager state is now CONNECTED_LOCAL
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.5267] manager: NetworkManager state is now DISCONNECTED
Sep 03 18:28:27 saruman systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.

As mentioned previously, CONFIG_PROVE_LOCKING=y and I'm seeing splats during boot, notably RTNL assertion failed at net/core/dev.c (2877) and suspicious RCU usage.

Cheers
James

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] [WIP v2] igc: fix deadlock caused by taking RTNL in RPM resume path
@ 2023-09-03 17:57                                         ` James Hogan
  0 siblings, 0 replies; 38+ messages in thread
From: James Hogan @ 2023-09-03 17:57 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Paul Menzel, netdev, Jesse Brandeburg, Aleksandr Loktionov,
	intel-wired-lan, Tony Nguyen

On Tuesday, 29 August 2023 02:58:42 BST Vinicius Costa Gomes wrote:
> James Hogan <jhogan@kernel.org> writes:
> > On Sunday, 2 October 2022 11:56:28 BST James Hogan wrote:
> >> On Monday, 29 August 2022 09:16:33 BST James Hogan wrote:
> >> > I'd be great to have this longstanding issue properly fixed rather than
> >> > having to carry a patch locally that may not be lock safe.
> >> > 
> >> > Also, any tips for diagnosing the issue of the network link not coming
> >> > back
> >> > up after resume? I sometimes have to unload and reload the driver
> >> > module
> >> > to
> >> > get it back again.
> >> 
> >> Any thoughts on this from anybody?
> > 
> > Ping... I've been carrying this patch locally on archlinux for almost a
> > year now. Every time I update my kernel and forget to rebuild with the
> > patch it catches me out with deadlocks after resume, and even with the
> > patch I frequently have to reload the igc module after resume to get the
> > network to come up (which is preferable to deadlocks but still really
> > sucks). I'd really appreciate if it could get some attention.
> 
> I am setting up my test systems to reproduce the deadlocks, then let's
> see what ideas happen about removing the need for those locks.
> 
> About the link failures, are there any error messages in the kernel
> logs? (also, if you could share those logs, can be off-list, it would
> help) I am trying to think what could be happening, and how to further
> debug this.

Looking through the resume log, the only network/igc related items are these:

Sep 03 18:28:17 saruman kernel: igc 0000:06:00.0 enp6s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.7180] manager: sleep: wake requested (sleeping: yes  enabled: yes)
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.7181] device (enp6s0): state change: activated -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Sep 03 18:28:17 saruman avahi-daemon[989]: Withdrawing address record for 192.168.1.239 on enp6s0.
Sep 03 18:28:17 saruman avahi-daemon[989]: Leaving mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:17 saruman avahi-daemon[989]: Interface enp6s0.IPv4 no longer relevant for mDNS.
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8202] manager: NetworkManager state is now CONNECTED_GLOBAL
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8657] manager: NetworkManager state is now DISCONNECTED
Sep 03 18:28:17 saruman NetworkManager[1016]: <info>  [1693762097.8660] device (enp6s0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Sep 03 18:28:17 saruman systemd[1]: Starting Network Manager Script Dispatcher Service...
Sep 03 18:28:17 saruman systemd[1]: Started Network Manager Script Dispatcher Service.
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3075] device (enp6s0): carrier: link connected
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3076] device (enp6s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3080] policy: auto-activating connection 'Wired connection 1' (f6634f16-77ca-34f7-846a-8c41e15a8ad1)
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3082] device (enp6s0): Activation: starting connection 'Wired connection 1' (f6634f16-77ca-34f7-846a-8c41e15a8ad1)
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3082] device (enp6s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3083] manager: NetworkManager state is now CONNECTING
Sep 03 18:28:21 saruman kernel: igc 0000:06:00.0 enp6s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Sep 03 18:28:21 saruman kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp6s0: link becomes ready
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3506] device (enp6s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3512] device (enp6s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:21 saruman NetworkManager[1016]: <info>  [1693762101.3515] policy: set 'Wired connection 1' (enp6s0) as default for IPv4 routing and DNS
Sep 03 18:28:21 saruman avahi-daemon[989]: Joining mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:21 saruman avahi-daemon[989]: New relevant interface enp6s0.IPv4 for mDNS.
Sep 03 18:28:21 saruman avahi-daemon[989]: Registering new address record for 192.168.1.239 on enp6s0.IPv4.
Sep 03 18:28:22 saruman systemd[1]: systemd-rfkill.service: Deactivated successfully.
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3544] device (enp6s0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3553] device (enp6s0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3554] device (enp6s0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3555] manager: NetworkManager state is now CONNECTED_SITE
Sep 03 18:28:23 saruman NetworkManager[1016]: <info>  [1693762103.3556] device (enp6s0): Activation: successful, device activated.
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.3532] device (enp6s0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Sep 03 18:28:27 saruman avahi-daemon[989]: Withdrawing address record for 192.168.1.239 on enp6s0.
Sep 03 18:28:27 saruman avahi-daemon[989]: Leaving mDNS multicast group on interface enp6s0.IPv4 with address 192.168.1.239.
Sep 03 18:28:27 saruman avahi-daemon[989]: Interface enp6s0.IPv4 no longer relevant for mDNS.
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.5266] manager: NetworkManager state is now CONNECTED_LOCAL
Sep 03 18:28:27 saruman NetworkManager[1016]: <info>  [1693762107.5267] manager: NetworkManager state is now DISCONNECTED
Sep 03 18:28:27 saruman systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.

As mentioned previously, CONFIG_PROVE_LOCKING=y and I'm seeing splats during boot, notably RTNL assertion failed at net/core/dev.c (2877) and suspicious RCU usage.

Cheers
James

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2023-09-03 17:58 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14  8:14 [Intel-wired-lan] I225-V (igc driver) hangs after resume in igc_resume/igc_tsn_reset James Hogan
2022-07-15 17:25 ` Tony Nguyen
2022-07-17 19:59 ` Vinicius Costa Gomes
     [not found]   ` <4773114.31r3eYUQgx@saruman>
2022-07-23 15:52     ` James Hogan
2022-07-27 14:37       ` Vinicius Costa Gomes
2022-07-28 17:36         ` James Hogan
2022-08-04 13:03           ` James Hogan
2022-08-04 13:27             ` Paul Menzel
2022-08-04 21:41               ` James Hogan
2022-08-04 21:41                 ` James Hogan
2022-08-04 22:07                 ` James Hogan
2022-08-04 22:07                   ` James Hogan
2022-08-05 11:25                   ` James Hogan
2022-08-05 11:25                     ` James Hogan
2022-08-11 15:13                     ` [Intel-wired-lan] [PATCH] igc: fix deadlock caused by taking RTNL in RPM resume path Vinicius Costa Gomes
2022-08-11 15:13                       ` Vinicius Costa Gomes
2022-08-11 18:58                       ` [Intel-wired-lan] " kernel test robot
2022-08-11 18:58                         ` kernel test robot
2022-08-11 19:59                       ` kernel test robot
2022-08-11 19:59                         ` kernel test robot
2022-08-11 20:25                       ` [WIP v2] " Vinicius Costa Gomes
2022-08-11 20:25                         ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-08-11 21:41                         ` James Hogan
2022-08-11 21:41                           ` [Intel-wired-lan] " James Hogan
2022-08-13  0:05                           ` Vinicius Costa Gomes
2022-08-13  0:05                             ` [Intel-wired-lan] " Vinicius Costa Gomes
2022-08-13 17:18                             ` James Hogan
2022-08-13 17:18                               ` [Intel-wired-lan] " James Hogan
2022-08-29  8:16                               ` James Hogan
2022-08-29  8:16                                 ` [Intel-wired-lan] " James Hogan
2022-10-02 10:56                                 ` James Hogan
2022-10-02 10:56                                   ` [Intel-wired-lan] " James Hogan
2023-08-14 11:04                                   ` James Hogan
2023-08-14 11:04                                     ` [Intel-wired-lan] " James Hogan
2023-08-29  1:58                                     ` Vinicius Costa Gomes
2023-08-29  1:58                                       ` Vinicius Costa Gomes
2023-09-03 17:57                                       ` James Hogan
2023-09-03 17:57                                         ` [Intel-wired-lan] " James Hogan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.