Hi Paul, On Mon, Mar 28, 2016 at 06:29:32AM -0700, Paul E. McKenney wrote: > On Mon, Mar 28, 2016 at 08:28:51AM +0200, Peter Zijlstra wrote: > > On Sun, Mar 27, 2016 at 02:09:14PM -0700, Paul E. McKenney wrote: > > > > > > Does that system have MONITOR/MWAIT errata? > > > > > > On the off-chance that this question was also directed at me, > > > > Hehe, it wasn't, however, since we're here.. > > > > > here is > > > what I am running on. I am running in a qemu/KVM virtual machine, in > > > case that matters. > > > > Have you actually tried on real proper hardware? Does it still reproduce > > there? > > Ross has, but I have not, given that I have a shared system on the one > hand and a single-socket (four core, eight hardware thread) laptop on > the other that has even longer reproduction times. The repeat-by is > as follows: > > o Build a kernel with the following Kconfigs: > > CONFIG_SMP=y > CONFIG_NR_CPUS=16 > CONFIG_PREEMPT_NONE=n > CONFIG_PREEMPT_VOLUNTARY=n > CONFIG_PREEMPT=y > # This should result in CONFIG_PREEMPT_RCU=y > CONFIG_HZ_PERIODIC=y > CONFIG_NO_HZ_IDLE=n > CONFIG_NO_HZ_FULL=n > CONFIG_RCU_TRACE=y > CONFIG_HOTPLUG_CPU=y > CONFIG_RCU_FANOUT=2 > CONFIG_RCU_FANOUT_LEAF=2 > CONFIG_RCU_NOCB_CPU=n > CONFIG_DEBUG_LOCK_ALLOC=n > CONFIG_RCU_BOOST=y > CONFIG_RCU_KTHREAD_PRIO=2 > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > CONFIG_RCU_EXPERT=y > CONFIG_RCU_TORTURE_TEST=y > CONFIG_PRINTK_TIME=y > CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP=y > CONFIG_RCU_TORTURE_TEST_SLOW_INIT=y > CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT=y > > If desired, you can instead build with CONFIG_RCU_TORTURE_TEST=m > and modprobe/insmod the module manually. > > o Find a two-socket x86 system or larger, with at least 16 CPUs. > > o Boot the kernel with the following kernel boot parameters: > > rcutorture.onoff_interval=1 rcutorture.onoff_holdoff=30 > > The onoff_holdoff is only needed for CONFIG_RCU_TORTURE_TEST=y. > When manually setting up the module, you get the holdoff for > free, courtesy of human timescales. > I ran several tests similar to this setup in a guest, the guest was running in a 12-core Xeon X5690 host with kernel 4.4.5. The guest was given 16 VCPUs and 2G memory. Rather than RCU stall warnings, I was able to see a bug like this nearly *everytime*: [ 308.291635] BUG: unable to handle kernel NULL pointer dereference at (null) [ 308.292063] IP: [] migrate_timer_list+0x32/0xc0 [ 308.292063] PGD 0 [ 308.292063] Oops: 0002 [#1] PREEMPT SMP [ 308.292063] Modules linked in: [ 308.292063] CPU: 11 PID: 798 Comm: torture_onoff Not tainted 4.5.0-rc1+ #14 [ 308.292063] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-20160216_104851-anatol 04/01/2014 [ 308.292063] task: ffff88007cafcb00 ti: ffff88007c6f4000 task.ti: ffff88007c6f4000 [ 308.292063] RIP: 0010:[] [] migrate_timer_list+0x32/0xc0 [ 308.292063] RSP: 0018:ffff88007c6f7c98 EFLAGS: 00010086 [ 308.292063] RAX: dead000000000200 RBX: ffff88007fb6cc00 RCX: ffff88007fa2cc00 [ 308.292063] RDX: 0000000000000000 RSI: ffff88007fa2d6d0 RDI: ffff88007fb6cc00 [ 308.292063] RBP: ffff88007c6f7cc0 R08: 0000000000000000 R09: ffffffff81e4d360 [ 308.292063] R10: ffffea0001f2daa0 R11: 0000000000000000 R12: ffff88007fa2d6d0 [ 308.292063] R13: ffff88007fb6cc00 R14: 000000000000000b R15: ffff88007fa2c420 [ 308.292063] FS: 0000000000000000(0000) GS:ffff88007fb60000(0000) knlGS:0000000000000000 [ 308.292063] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 308.292063] CR2: 0000000000000000 CR3: 0000000001e0f000 CR4: 00000000000006e0 [ 308.292063] Stack: [ 308.292063] ffff88007fb6cc00 ffff88007fa2d6d8 ffff88007fa2dad0 ffff88007fa2d8d0 [ 308.292063] ffff88007fa2d4d8 ffff88007c6f7d08 ffffffff810b95c4 ffff88007fa2cc00 [ 308.292063] ffff88007fa2d648 00000000fffffff0 ffffffff81e4e0c0 0000000000000007 [ 308.292063] Call Trace: [ 308.292063] [] timer_cpu_notify+0xe4/0x160 [ 308.292063] [] notifier_call_chain+0x45/0x70 [ 308.292063] [] __raw_notifier_call_chain+0x9/0x10 [ 308.292063] [] cpu_notify_nofail+0x1e/0x30 [ 308.292063] [] _cpu_down+0x212/0x350 [ 308.292063] [] ? __call_rcu.constprop.74+0x380/0x380 [ 308.292063] [] ? call_rcu_bh+0x20/0x20 [ 308.292063] [] ? __rcu_read_unlock+0x50/0x50 [ 308.292063] [] ? __rcu_read_unlock+0x50/0x50 [ 308.292063] [] cpu_down+0x2f/0x50 [ 308.292063] [] torture_onoff+0x115/0x380 [ 308.292063] [] ? torture_kthread_stopping+0x70/0x70 [ 308.292063] [] kthread+0xd6/0xf0 [ 308.292063] [] ? kthread_worker_fn+0x180/0x180 [ 308.292063] [] ret_from_fork+0x3f/0x70 [ 308.292063] [] ? kthread_worker_fn+0x180/0x180 [ 308.292063] Code: 56 41 55 41 54 53 48 8b 06 44 8b 77 40 48 85 c0 74 53 49 89 fd 49 89 f4 4d 8b 3c 24 66 66 66 66 90 49 8b 07 49 8b 57 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 02 00 00 00 00 ad de 4c 89 [ 308.292063] RIP [] migrate_timer_list+0x32/0xc0 [ 308.292063] RSP [ 308.292063] CR2: 0000000000000000 [ 308.292063] ---[ end trace 29cb5d70ad875872 ]--- I'm not sure whether this is relevant, maybe this was caused by my setup? But I attach two kernel logs and my config in case this could help us solve the problem. Another interesting I observed was that the time stamps of printk went back in a while in some run, please note the log named "dmesg.1" in the attachments. A quick analysis of this bug is the null dereference happened at: migrate_timer_list() detach_timer(); __list_del(); WRITE_ONCE(*pprev, next); <- pprev is NULL This seems highly unlikely, but as I said, I hit this almost every time after a few minutes of starting rcutoture. Any idea about whether this could be related to the RCU stall problem, or this is another bug, or this actually happend by design for some reasons? Regards, Boqun > In the absence of instrumentation, I get failures usually within a > couple of hours, though sometimes much longer. With instrumentation, > the sky appears to be the limit. :-/ > > Ross is running on bare metal with no CPU hotplug, so perhaps his setup > is of more immediate interest. He is seeing the same symptoms that I am, > namely a task being repeatedly awakened without actually coming out of > TASK_INTERRUPTIBLE state, let alone running. As you pointed out earlier, > he cannot be seeing the same bug that my crude patch suppresses, but > given that I still see a few failures with that crude patch, it is quite > possible that there is still a common bug. > > Thanx, Paul >