On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: >Hi Linus, > >Up to now we see the below boot error/warnings when testing v4.14-rc6. > >They hit the RC release mainly due to various imperfections in 0day's >auto bisection. So I manually list them here and CC the likely easy to >debug ones to the corresponding maintainers in the followup emails. > >boot_successes: 4700 >boot_failures: 247 [...] >WARNING:at_arch/x86/kernel/smp.c:#native_smp_send_reschedule: 1 That's bisected to an update of RCU torture parameter: commit 492b95e59735998312f678d77a2d5fe20af6b0b9 Author: Paul E. McKenney AuthorDate: Fri Apr 21 16:09:15 2017 -0700 Commit: Paul E. McKenney CommitDate: Thu Jun 8 08:25:31 2017 -0700 rcuperf: Set more user-friendly defaults Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built as a module, rcuperf.shutdown. This commit therefore sets the default for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown to zero if rcuperf is built as a module and to one otherwise. Signed-off-by: Paul E. McKenney --- kernel/rcu/rcuperf.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c index 5158ddba67162..49c8ed6bd2fda 100644 --- a/kernel/rcu/rcuperf.c +++ b/kernel/rcu/rcuperf.c @@ -63,9 +63,10 @@ torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives"); torture_param(int, gp_async_max, 1000, "Max # outstanding waits per reader"); torture_param(bool, gp_exp, false, "Use expedited GP wait primitives"); torture_param(int, holdoff, 10, "Holdoff time before test start (s)"); -torture_param(int, nreaders, -1, "Number of RCU reader threads"); +torture_param(int, nreaders, 0, "Number of RCU reader threads"); torture_param(int, nwriters, -1, "Number of RCU updater threads"); -torture_param(bool, shutdown, false, "Shutdown at end of performance tests."); +torture_param(bool, shutdown, !IS_ENABLED(MODULE), + "Shutdown at end of performance tests."); torture_param(bool, verbose, true, "Enable verbose debugging printk()s"); static char *perf_type = "rcu"; Attached the dmesg. Here are the call traces -- note that the first WARNING only shows up in latest mainline tree, so might be irrelevant to the above commit. [ 24.691613] rcu-perf: 1 writer-duration: 97 13680809 [ 24.692466] rcu-perf: 1 writer-duration: 98 9561892 [ 24.693192] rcu-perf: 1 writer-duration: 99 7970345 [ 24.693816] rcu-perf: 1 writer-duration: 100 9788023 [ 24.700210] no ifx modem active; [ 24.701420] ACPI: Preparing to enter system sleep state S5 [ 24.704534] reboot: Power down [ 24.705645] acpi_power_off called [ 24.713350] ------------[ cut here ]------------ [ 24.713804] WARNING: CPU: 0 PID: 11 at kernel/sched/core.c:1179 set_task_cpu+0x157/0x285 [ 24.714779] Modules linked in: i2c_piix4 input_leds psmouse i2c_core button [ 24.715437] CPU: 0 PID: 11 Comm: migration/0 Not tainted 4.14.0-rc6-00077-g11224e1 #1 [ 24.716188] task: ffff880032955000 task.stack: ffff88003294c000 [ 24.716757] RIP: 0010:set_task_cpu+0x157/0x285 [ 24.717193] RSP: 0018:ffff88003294fe08 EFLAGS: 00010002 [ 24.717746] RAX: 0000000000000001 RBX: ffff880031449000 RCX: 0000000000000000 [ 24.718416] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff82e2ad60 [ 24.719091] RBP: ffff88003294fe20 R08: 0000000000000000 R09: 0000000000000000 [ 24.719789] R10: ffff880014483b60 R11: 0000000000000002 R12: 0000000000000001 [ 24.720477] R13: 0000000000000001 R14: ffff88003294fe68 R15: ffff8800347eb300 [ 24.721155] FS: 0000000000000000(0000) GS:ffff880034600000(0000) knlGS:0000000000000000 [ 24.721947] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 24.722479] CR2: 00007f1cdeec8fd0 CR3: 000000003722d000 CR4: 00000000000006b0 [ 24.723144] Call Trace: [ 24.723388] move_queued_task+0x8c/0x108 [ 24.723758] migration_cpu_stop+0x82/0xb7 [ 24.724131] ? sched_ttwu_pending+0x78/0x78 [ 24.724748] ? sort_range+0x17/0x17 [ 24.725214] cpu_stopper_thread+0x87/0x14c [ 24.725785] ? sort_range+0x17/0x17 [ 24.726194] smpboot_thread_fn+0x2a9/0x2c0 [ 24.726718] kthread+0xf8/0x100 [ 24.727113] ? kthread_unpark+0x5a/0x5a [ 24.727532] ret_from_fork+0x25/0x30 [ 24.727856] Code: 0f a3 2d 85 d2 e3 01 41 0f 92 c5 41 83 f5 01 31 c9 31 d2 45 0f b6 ed 48 c7 c7 60 ad e2 82 44 89 ee e8 53 24 08 00 45 85 ed 74 02 <0f> ff 31 c9 31 d2 44 89 ee 48 c7 c7 30 ad e2 82 45 31 ed e8 36 [ 24.729721] ---[ end trace b7ae64ce6fa90a4f ]--- [ 24.730154] sched: Unexpected reschedule of offline CPU#1! [ 24.730700] ------------[ cut here ]------------ [ 24.731133] WARNING: CPU: 0 PID: 11 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x69/0x9e [ 24.732162] Modules linked in: i2c_piix4 input_leds psmouse i2c_core button [ 24.732859] CPU: 0 PID: 11 Comm: migration/0 Tainted: G W 4.14.0-rc6-00077-g11224e1 #1 [ 24.733690] task: ffff880032955000 task.stack: ffff88003294c000 [ 24.734449] RIP: 0010:native_smp_send_reschedule+0x69/0x9e [ 24.735158] RSP: 0018:ffff88003294fdf8 EFLAGS: 00010092 [ 24.735704] RAX: 000000000000002e RBX: 0000000000000001 RCX: 0000000000000000 [ 24.736382] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82e32ef0 [ 24.737009] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 [ 24.737711] R10: ffff88003294fd30 R11: 0000000000000005 R12: ffff880031449000 [ 24.738370] R13: 0000000000000000 R14: ffff88003294fe68 R15: ffff8800347eb300 [ 24.738997] FS: 0000000000000000(0000) GS:ffff880034600000(0000) knlGS:0000000000000000 [ 24.739754] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 24.740261] CR2: 00007f1cdeec8fd0 CR3: 000000003722d000 CR4: 00000000000006b0 [ 24.740921] Call Trace: [ 24.741144] check_preempt_curr+0x47/0xc1 [ 24.741534] move_queued_task+0xfa/0x108 [ 24.741886] migration_cpu_stop+0x82/0xb7 [ 24.742240] ? sched_ttwu_pending+0x78/0x78 [ 24.742649] ? sort_range+0x17/0x17 [ 24.742974] cpu_stopper_thread+0x87/0x14c [ 24.743401] ? sort_range+0x17/0x17 [ 24.743731] smpboot_thread_fn+0x2a9/0x2c0 [ 24.744111] kthread+0xf8/0x100 [ 24.744414] ? kthread_unpark+0x5a/0x5a [ 24.744769] ret_from_fork+0x25/0x30 [ 24.745116] Code: ed 74 44 b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 50 c4 e1 82 e8 e4 d4 0d 00 89 de 48 c7 c7 76 4c 86 81 31 c0 e8 e1 5e 08 00 <0f> ff 5b 5d b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 20 c4 [ 24.746913] ---[ end trace b7ae64ce6fa90a50 ]--- Elapsed time: 2040 BUG: kernel hang in test stage Thanks, Fengguang