4.14-rt: Fix wrong-variable use in irq_set_affinity_notifier. The bug was introduced in the 4.14-rt patch 0461-genirq-Handle-missing-work_struct-in-irq_set_affinit.patch The symptom is a NULL pointer panic in the i40e driver on system shutdown. Rebooting. BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: __kthread_cancel_work_sync+0x12/0xa0 CPU: 15 PID: 6274 Comm: reboot Not tainted 4.14.155-rt70-RedHawk-8.0.2-prt-trace #1 task: ffff9ef0d1a58000 task.stack: ffffbe540c038000 RIP: 0010:__kthread_cancel_work_sync+0x12/0xa0 RSP: 0018:ffffbe540c03bbd8 EFLAGS: 00010296 RAX: 0000084000000020 RBX: 0000000000000000 RCX: 0000000000000034 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008 RBP: ffffbe540c03bc00 R08: ffff9ee8ccdc3800 R09: ffff9ef0d8c0c000 R10: ffff9ef0d8c0c028 R11: 0000000000000040 R12: ffff9ee8ccdc3800 R13: 0000000000000000 R14: ffff9ee8ccdc3960 R15: 0000000000000074 FS: 00007ffff7fcf380(0000) GS:ffff9ef0ffdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 000000104b428003 CR4: 00000000005606e0 DR0: 00000000006040e0 DR1: 00000000006040e8 DR2: 00000000006040f0 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 PKRU: 55555554 Call Trace: kthread_cancel_work_sync+0xb/0x10 irq_set_affinity_notifier+0x8e/0xc0 i40e_vsi_free_irq+0xbc/0x230 [i40e] i40e_vsi_close+0x24/0xa0 [i40e] i40e_close+0x10/0x20 [i40e] i40e_quiesce_vsi.part.40+0x30/0x40 [i40e] i40e_pf_quiesce_all_vsi.isra.41+0x34/0x50 [i40e] i40e_prep_for_reset+0x67/0x110 [i40e] i40e_shutdown+0x39/0x220 [i40e] pci_device_shutdown+0x2b/0x50 device_shutdown+0x147/0x1f0 kernel_restart_prepare+0x71/0x74 kernel_restart+0xd/0x4e SyS_reboot.cold.1+0x9/0x34 do_syscall_64+0x7c/0x150 4.19-rt and above do not have this problem due to a refactoring. Signed-off-by: Joe Korty <Joe.Korty@concurrent-rt.com> Index: b/kernel/irq/manage.c =================================================================== --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -353,7 +353,7 @@ irq_set_affinity_notifier(unsigned int i if (old_notify) { #ifdef CONFIG_PREEMPT_RT_BASE - kthread_cancel_work_sync(¬ify->work); + kthread_cancel_work_sync(&old_notify->work); #else cancel_work_sync(&old_notify->work); #endif
On 2020-02-11 09:00:07 [-0500], Joe Korty wrote: > 4.14-rt: Fix wrong-variable use in irq_set_affinity_notifier. > > The bug was introduced in the 4.14-rt patch > > 0461-genirq-Handle-missing-work_struct-in-irq_set_affinit.patch > > The symptom is a NULL pointer panic in the i40e driver on > system shutdown. … > 4.19-rt and above do not have this problem due to a refactoring. That would be Tom's to pick. Is v4.14-RT the only one affected? I was under the impression that we fixed it already in each stable rt tree. > Signed-off-by: Joe Korty <Joe.Korty@concurrent-rt.com> > > Index: b/kernel/irq/manage.c > =================================================================== > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -353,7 +353,7 @@ irq_set_affinity_notifier(unsigned int i > > if (old_notify) { > #ifdef CONFIG_PREEMPT_RT_BASE > - kthread_cancel_work_sync(¬ify->work); > + kthread_cancel_work_sync(&old_notify->work); > #else > cancel_work_sync(&old_notify->work); > #endif Sebastian
On Tue, Feb 11, 2020 at 06:49:15PM +0100, Sebastian Andrzej Siewior wrote: > On 2020-02-11 09:00:07 [-0500], Joe Korty wrote: > > 4.14-rt: Fix wrong-variable use in irq_set_affinity_notifier. > > > > The bug was introduced in the 4.14-rt patch > > > > 0461-genirq-Handle-missing-work_struct-in-irq_set_affinit.patch > > > > The symptom is a NULL pointer panic in the i40e driver on > > system shutdown. > ??? > > 4.19-rt and above do not have this problem due to a refactoring. > > That would be Tom's to pick. Is v4.14-RT the only one affected? I was > under the impression that we fixed it already in each stable rt tree. A quick grep of all the -rt patch files in kernel.org/../projects/rt, newer than May, 2019, shows that 4.14 is the only one needing a fix. patch-4.14.170-rt74.patch.xz: + kthread_cancel_work_sync(¬ify->work); patch-4.4.208-rt191.patch.xz: patch-3.18.18-rt15.patch.xz: patch-5.0.21-rt16.patch.xz: patch-5.2.21-rt14.patch.xz: patch-5.4.17-rt9.patch.xz: patch-4.9.201-rt134.patch.xz: > > > Signed-off-by: Joe Korty <Joe.Korty@concurrent-rt.com> > > > > Index: b/kernel/irq/manage.c > > =================================================================== > > --- a/kernel/irq/manage.c > > +++ b/kernel/irq/manage.c > > @@ -353,7 +353,7 @@ irq_set_affinity_notifier(unsigned int i > > > > if (old_notify) { > > #ifdef CONFIG_PREEMPT_RT_BASE > > - kthread_cancel_work_sync(¬ify->work); > > + kthread_cancel_work_sync(&old_notify->work); > > #else > > cancel_work_sync(&old_notify->work); > > #endif > > Sebastian -- Regards, Joe _________________________________________________ Joe Korty Concurrent Real-Time, Inc. 2881 Gateway Drive Pompano Beach, Florida USA 33069 Phone: +1 954.973.5262 Email: joe.korty@concurrent-rt.com _________________________________________________
Hi Joe, On Tue, 2020-02-11 at 13:40 -0500, Joe Korty wrote: > On Tue, Feb 11, 2020 at 06:49:15PM +0100, Sebastian Andrzej Siewior > wrote: > > On 2020-02-11 09:00:07 [-0500], Joe Korty wrote: > > > 4.14-rt: Fix wrong-variable use in irq_set_affinity_notifier. > > > > > > The bug was introduced in the 4.14-rt patch > > > > > > 0461-genirq-Handle-missing-work_struct-in- > > > irq_set_affinit.patch > > > > > > The symptom is a NULL pointer panic in the i40e driver on > > > system shutdown. > > > > ??? > > > 4.19-rt and above do not have this problem due to a refactoring. > > > > That would be Tom's to pick. Is v4.14-RT the only one affected? I > > was > > under the impression that we fixed it already in each stable rt > > tree. > > A quick grep of all the -rt patch files in kernel.org/../projects/rt, > newer > than May, 2019, shows that 4.14 is the only one needing a fix. > Yeah, what happened here was that I screwed up when fixing a merge conflict, and inadvertently changed old_notify->work to notify->work in the fix. Thanks for pointing this out - I'll include your patch doing the right thing in the next update, coming up shortly.. Tom > patch-4.14.170-rt74.patch.xz: > + kthread_cancel_work_sync(¬ify->work); > patch-4.4.208-rt191.patch.xz: > patch-3.18.18-rt15.patch.xz: > patch-5.0.21-rt16.patch.xz: > patch-5.2.21-rt14.patch.xz: > patch-5.4.17-rt9.patch.xz: > patch-4.9.201-rt134.patch.xz: > > > > > > Signed-off-by: Joe Korty <Joe.Korty@concurrent-rt.com> > > > > > > Index: b/kernel/irq/manage.c > > > ================================================================= > > > == > > > --- a/kernel/irq/manage.c > > > +++ b/kernel/irq/manage.c > > > @@ -353,7 +353,7 @@ irq_set_affinity_notifier(unsigned int i > > > > > > if (old_notify) { > > > #ifdef CONFIG_PREEMPT_RT_BASE > > > - kthread_cancel_work_sync(¬ify->work); > > > + kthread_cancel_work_sync(&old_notify->work); > > > #else > > > cancel_work_sync(&old_notify->work); > > > #endif > > > > Sebastian > >