On Thu, Feb 5, 2015 at 7:45 PM, Paul E. McKenney wrote: > On Thu, Feb 05, 2015 at 10:35:33AM -0800, Dave Hansen wrote: >> On 02/05/2015 10:34 AM, Paul E. McKenney wrote: >> >> > Did I actually need to be >> >> > onlining/offlining CPUs to hit the splat that Sedat was reporting? >> > Yep, you do need to offline at least one CPU to hit that splat. >> >> Heh, do we need a debugging mode that will randomly offline/online CPUs? :) > > For that, kernel/rcu/rcutorture.c and kernel/locking/locktorture.c > are your friends. ;-) > > The problem is that I only run RCU-relevant combinations of Kconfigs, > which means that I missed the ones that Sedat used to find this problem. > So I guess it is a good thing that others run -next testing. > [ Revived by a voltaren resinat pill... ] I reverted "x86/mm: Omit switch_mm() tracing for offline CPUs" ...and... applied "tlb: Don't do trace_tlb_flush() on offline CPUs" ...in my build-dir. ( I did not build from scratch but re-invoking make "updated" the files touched by Steven's patch, see attached build-log. ) Unfortunately, the call-trace remains when doing an offlining of cpu1. ( It's good to see it's reproducible. ) root# echo 0 > /sys/devices/system/cpu/cpu1/online [ 121.652796] intel_pstate CPU 1 exiting [ 121.666272] [ 121.666274] =============================== [ 121.666274] [ INFO: suspicious RCU usage. ] [ 121.666277] 3.19.0-rc7-next-20150204.7-iniza-small #4 Not tainted [ 121.666278] ------------------------------- [ 121.666280] include/trace/events/tlb.h:37 suspicious rcu_dereference_check() usage! [ 121.666281] [ 121.666281] other info that might help us debug this: [ 121.666281] [ 121.666282] [ 121.666282] RCU used illegally from offline CPU! [ 121.666282] rcu_scheduler_active = 1, debug_locks = 0 [ 121.666283] no locks held by swapper/1/0. [ 121.666284] [ 121.666284] stack backtrace: [ 121.666287] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.7-iniza-small #4 [ 121.666288] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 [ 121.666293] 0000000000000001 ffff88011a44fe18 ffffffff817e39cd 0000000000000011 [ 121.666296] ffff88011a448290 ffff88011a44fe48 ffffffff810d6af7 ffff8800d3dfaac0 [ 121.666299] 0000000000000001 ffffffff81d32ce0 0000000000000005 ffff88011a44fe78 [ 121.666300] Call Trace: [ 121.666308] [] dump_stack+0x4c/0x65 [ 121.666313] [] lockdep_rcu_suspicious+0xe7/0x120 [ 121.666318] [] idle_task_exit+0x1c9/0x260 [ 121.666322] [] play_dead_common+0xe/0x50 [ 121.666325] [] native_play_dead+0x15/0x140 [ 121.666330] [] arch_cpu_idle_dead+0xf/0x20 [ 121.666333] [] cpu_startup_entry+0x37e/0x580 [ 121.666336] [] start_secondary+0x140/0x150 [ 121.666744] smpboot: CPU 1 is now offline >From rcu point this is now safe? But another area (linux-pm?) is still affected? I will try to test "vanilla" pm-next if the problem exists with intel_pstate as suggested by Rafael. Hmmm, not sure how I can get the pm-next code which went into next-20150204 as linux-pm.git#linux-next was feeded with new stuff. - Sedat -