Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent commit 85f1abe0019fcb3ea10df7029056cf42702283a8 Author: Peter Zijlstra AuthorDate: Tue May 1 18:14:45 2018 +0200 Commit: Ingo Molnar CommitDate: Thu May 3 07:38:05 2018 +0200 kthread, sched/wait: Fix kthread_parkme() completion issue Even with the wait-loop fixed, there is a further issue with kthread_parkme(). Upon hotplug, when we do takedown_cpu(), smpboot_park_threads() can return before all those threads are in fact blocked, due to the placement of the complete() in __kthread_parkme(). When that happens, sched_cpu_dying() -> migrate_tasks() can end up migrating such a still runnable task onto another CPU. Normally the task will have hit schedule() and gone to sleep by the time we do kthread_unpark(), which will then do __kthread_bind() to re-bind the task to the correct CPU. However, when we loose the initial TASK_PARKED store to the concurrent wakeup issue described previously, do the complete(), get migrated, it is possible to either: - observe kthread_unpark()'s clearing of SHOULD_PARK and terminate the park and set TASK_RUNNING, or - __kthread_bind()'s wait_task_inactive() to observe the competing TASK_RUNNING store. Either way the WARN() in __kthread_bind() will trigger and fail to correctly set the CPU affinity. Fix this by only issuing the complete() when the kthread has scheduled out. This does away with all the icky 'still running' nonsense. The alternative is to promote TASK_PARKED to a special state, this guarantees wait_task_inactive() cannot observe a 'stale' TASK_RUNNING and we'll end up doing the right thing, but this preserves the whole icky business of potentially migating the still runnable thing. Reported-by: Gaurav Kohli Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar 741a76b350 kthread, sched/wait: Fix kthread_parkme() wait-loop 85f1abe001 kthread, sched/wait: Fix kthread_parkme() completion issue 12bc056bd1 Merge branch 'sched/urgent' +-------------------------------------------+------------+------------+------------+ | | 741a76b350 | 85f1abe001 | 12bc056bd1 | +-------------------------------------------+------------+------------+------------+ | boot_successes | 44 | 0 | 0 | | boot_failures | 0 | 26 | 22 | | WARNING:at_kernel/kthread.c:#kthread_park | 0 | 26 | 22 | | EIP:kthread_park | 0 | 26 | 22 | +-------------------------------------------+------------+------------+------------+ [ 0.011005] CPU: GenuineIntel Intel Core Processor (Haswell) (family: 0x6, model: 0x3c, stepping: 0x4) [ 0.012011] Spectre V2 : Spectre mitigation: kernel not compiled with retpoline; no mitigation available! [ 0.013949] Performance Events: no PMU driver, software events only. [ 0.019490] NMI watchdog: Perf event create on CPU 0 failed with -2 [ 0.020005] NMI watchdog: Perf NMI watchdog permanently disabled [ 0.020692] WARNING: CPU: 0 PID: 1 at kernel/kthread.c:486 kthread_park+0x2b/0x56 [ 0.021000] CPU: 0 PID: 1 Comm: swapper Not tainted 4.17.0-rc3-00038-g85f1abe #687 [ 0.021000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 0.021000] EIP: kthread_park+0x2b/0x56 [ 0.021000] EFLAGS: 00210202 CPU: 0 [ 0.021000] EAX: cf464d40 EBX: cf422500 ECX: 00000000 EDX: 00000004 [ 0.021000] ESI: c221e2c0 EDI: 00000000 EBP: cf42df68 ESP: cf42df60 [ 0.021000] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 0.021000] CR0: 80050033 CR2: ffffffff CR3: 022e8000 CR4: 00140690 [ 0.021000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 0.021000] DR6: fffe0ff0 DR7: 00000400 [ 0.021000] Call Trace: [ 0.021000] smpboot_update_cpumask_percpu_thread+0x28/0x42 [ 0.021000] softlockup_update_smpboot_threads+0x37/0x39 [ 0.021000] lockup_detector_reconfigure+0x17/0x62 [ 0.021000] lockup_detector_init+0x5d/0x69 [ 0.021000] kernel_init_freeable+0x52/0x15c [ 0.021000] ? rest_init+0xf4/0xf4 [ 0.021000] kernel_init+0x8/0xd0 [ 0.021000] ret_from_fork+0x19/0x30 [ 0.021000] Code: 55 89 e5 56 53 8b 50 14 0f ba e2 15 72 02 0f 0b 8b 98 44 03 00 00 80 e2 04 74 09 0f 0b be da ff ff ff eb 2c 8b 13 80 e2 04 74 09 <0f> 0b be f0 ff ff ff eb 1c 80 0b 04 8b 15 70 de f6 c1 31 f6 39 [ 0.021000] ---[ end trace 3a71adb42feecba7 ]--- [ 0.021060] TSC deadline timer enabled # HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD git bisect start d17cc3a1a1091797eff4a671659c15f2dc667996 6d08b06e67cd117f6992c46611dfb4ce267cd71e -- git bisect good 99309e94100238f4e0d1d6bdc31f4034587d5fb9 # 19:13 G 11 0 0 1 Merge 'linux-review/Bj-rn-Mork/qmi_wwan-do-not-steal-interfaces-from-class-drivers/20180503-164137' into devel-catchup-201805031730 git bisect bad 067517401c4ab83974fb63d7c4a98eb82791d15f # 19:22 B 0 3 27 11 Merge 'tip/sched/urgent' into devel-catchup-201805031730 git bisect good 7ff5000268355c63dc948ecb01f4de17987586e5 # 19:35 G 11 0 0 1 Merge tag 'sound-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound git bisect good 0d95cfa922c24bcc20b5ccf7496b6ac7c8e29efb # 20:44 G 11 0 0 0 Merge tag 'powerpc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux git bisect good 65f4d6d0f80b3c55830ec5735194703fa2909ba1 # 20:58 G 11 0 0 1 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 2d618bdf71635463a4aa4ad0fe46ec852292bc0c # 21:00 G 10 0 0 0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel git bisect good ecd649b3408408841d5793038b0241e55ac7a141 # 21:09 G 11 0 0 0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect good dcf234577cd31fa16874e828b90659166ad6b80d # 21:20 G 11 0 0 0 tracing: Add field modifier parsing hist error for hist triggers git bisect good 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f # 21:35 G 11 0 0 1 stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock git bisect good 741a76b350897604c48fb12beff1c9b77724dc96 # 21:51 G 11 0 0 2 kthread, sched/wait: Fix kthread_parkme() wait-loop git bisect bad 85f1abe0019fcb3ea10df7029056cf42702283a8 # 22:03 B 0 11 35 11 kthread, sched/wait: Fix kthread_parkme() completion issue # first bad commit: [85f1abe0019fcb3ea10df7029056cf42702283a8] kthread, sched/wait: Fix kthread_parkme() completion issue git bisect good 741a76b350897604c48fb12beff1c9b77724dc96 # 22:06 G 31 0 0 2 kthread, sched/wait: Fix kthread_parkme() wait-loop # extra tests with debug options git bisect bad 85f1abe0019fcb3ea10df7029056cf42702283a8 # 22:20 B 0 3 16 0 kthread, sched/wait: Fix kthread_parkme() completion issue # extra tests on HEAD of linux-devel/devel-catchup-201805031730 git bisect bad d17cc3a1a1091797eff4a671659c15f2dc667996 # 22:25 B 0 13 29 0 0day head guard for 'devel-catchup-201805031730' # extra tests on tree/branch tip/sched/urgent git bisect bad 85f1abe0019fcb3ea10df7029056cf42702283a8 # 22:33 B 0 26 39 0 kthread, sched/wait: Fix kthread_parkme() completion issue # extra tests with first bad commit reverted git bisect good 8058ae7fe7b61a67f8a7da7013e4c3d7fa5c0ba8 # 23:39 G 11 0 0 1 Revert "kthread, sched/wait: Fix kthread_parkme() completion issue" # extra tests on tree/branch tip/master git bisect bad 12bc056bd1170303f849954e8e566488acfc7202 # 23:51 B 0 11 35 11 Merge branch 'sched/urgent' --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/lkp Intel Corporation