Hi Paul, On Thu, Dec 18, 2014 at 12:54 AM, Paul E. McKenney wrote: > On Tue, Dec 16, 2014 at 11:00:20PM +0530, Arun KS wrote: >> Hello, >> >> Adding some more info. >> >> Below is the rcu_data data structure corresponding to cpu4. > > This shows that RCU is idle. What was the state of the system at the > time you collected this data? System initiated a suspend sequence and currently at disable_nonboot_cpus(). It has hotplugged 0, 1 and 2 successfully. And even successful in hot plugging cpu3. But while calling the CPU_POST_DEAD notifier for cpu3, another driver tried to unregister an atomic notifier. Which eventually calls syncronize_rcu() which hangs the suspend task. bt as follows, PID: 202 TASK: edcd2a00 CPU: 4 COMMAND: "kworker/u16:4" #0 [] (__schedule) from [] #1 [] (schedule_timeout) from [] #2 [] (wait_for_common) from [] #3 [] (wait_rcu_gp) from [] #4 [] (atomic_notifier_chain_unregister) from [] #5 [] (cpufreq_interactive_disable_sched_input) from [] #6 [] (cpufreq_governor_interactive) from [] #7 [] (__cpufreq_governor) from [] #8 [] (__cpufreq_remove_dev_finish) from [] #9 [] (cpufreq_cpu_callback) from [] #10 [] (notifier_call_chain) from [] #11 [] (__cpu_notify) from [] #12 [] (cpu_notify_nofail) from [] #13 [] (_cpu_down) from [] #14 [] (disable_nonboot_cpus) from [] #15 [] (suspend_devices_and_enter) from [] #16 [] (pm_suspend) from [] #17 [] (try_to_suspend) from [] #18 [] (process_one_work) from [] #19 [] (worker_thread) from [] #20 [] (kthread) from [] But the other cores 4-7 are active. I can see them going to idle tasks coming out from idle because of interrupts, scheduling kworkers etc. So when I took the data, all the online cores(4-7) were in idle as shown below from runq data structures. ------start-------------- crash> runq CPU 0 [OFFLINE] CPU 1 [OFFLINE] CPU 2 [OFFLINE] CPU 3 [OFFLINE] CPU 4 RUNQUEUE: c5439040 CURRENT: PID: 0 TASK: f0c9d400 COMMAND: "swapper/4" RT PRIO_ARRAY: c5439130 [no tasks queued] CFS RB_ROOT: c54390c0 [no tasks queued] CPU 5 RUNQUEUE: c5447040 CURRENT: PID: 0 TASK: f0c9aa00 COMMAND: "swapper/5" RT PRIO_ARRAY: c5447130 [no tasks queued] CFS RB_ROOT: c54470c0 [no tasks queued] CPU 6 RUNQUEUE: c5455040 CURRENT: PID: 0 TASK: f0c9ce00 COMMAND: "swapper/6" RT PRIO_ARRAY: c5455130 [no tasks queued] CFS RB_ROOT: c54550c0 [no tasks queued] CPU 7 RUNQUEUE: c5463040 CURRENT: PID: 0 TASK: f0c9b000 COMMAND: "swapper/7" RT PRIO_ARRAY: c5463130 [no tasks queued] CFS RB_ROOT: c54630c0 [no tasks queued] ------end-------------- but one strange thing i can see is that rcu_read_lock_nesting for idle tasks running on cpu 5 and cpu 6 are set to 1. PID: 0 TASK: f0c9d400 CPU: 4 COMMAND: "swapper/4" rcu_read_lock_nesting = 0, PID: 0 TASK: f0c9aa00 CPU: 5 COMMAND: "swapper/5" rcu_read_lock_nesting = 1, PID: 0 TASK: f0c9ce00 CPU: 6 COMMAND: "swapper/6" rcu_read_lock_nesting = 1, PID: 0 TASK: f0c9b000 CPU: 7 COMMAND: "swapper/7" rcu_read_lock_nesting = 0, Does this means that the current grace period(suspend thread is waiting on) is getting extended infinitely? Also attaching the per_cpu rcu_data for online and offline cores. Thanks, Arun > > Thanx, Paul > >> struct rcu_data { >> completed = 5877, >> gpnum = 5877, >> passed_quiesce = true, >> qs_pending = false, >> beenonline = true, >> preemptible = true, >> mynode = 0xc117f340 , >> grpmask = 16, >> nxtlist = 0xedaaec00, >> nxttail = {0xc54366c4, 0xe84d350c, 0xe84d350c, 0xe84d350c}, >> nxtcompleted = {4294967035, 5878, 5878, 5878}, >> qlen_lazy = 105, >> qlen = 415, >> qlen_last_fqs_check = 0, >> n_cbs_invoked = 86323, >> n_nocbs_invoked = 0, >> n_cbs_orphaned = 0, >> n_cbs_adopted = 139, >> n_force_qs_snap = 0, >> blimit = 10, >> dynticks = 0xc5436758, >> dynticks_snap = 7582140, >> dynticks_fqs = 41, >> offline_fqs = 0, >> n_rcu_pending = 59404, >> n_rp_qs_pending = 5, >> n_rp_report_qs = 4633, >> n_rp_cb_ready = 32, >> n_rp_cpu_needs_gp = 41088, >> n_rp_gp_completed = 2844, >> n_rp_gp_started = 1150, >> n_rp_need_nothing = 9657, >> barrier_head = { >> next = 0x0, >> func = 0x0 >> }, >> oom_head = { >> next = 0x0, >> func = 0x0 >> }, >> cpu = 4, >> rsp = 0xc117f340 >> } >> >> >> >> Also pasting complete rcu_preempt_state. >> >> >> >> rcu_preempt_state = $9 = { >> node = {{ >> lock = { >> raw_lock = { >> { >> slock = 3129850509, >> tickets = { >> owner = 47757, >> next = 47757 >> } >> } >> }, >> magic = 3735899821, >> owner_cpu = 4294967295, >> owner = 0xffffffff >> }, >> gpnum = 5877, >> completed = 5877, >> qsmask = 0, >> expmask = 0, >> qsmaskinit = 240, >> grpmask = 0, >> grplo = 0, >> grphi = 7, >> grpnum = 0 '\000', >> level = 0 '\000', >> parent = 0x0, >> blkd_tasks = { >> next = 0xc117f378 , >> prev = 0xc117f378 >> }, >> gp_tasks = 0x0, >> exp_tasks = 0x0, >> need_future_gp = {1, 0}, >> fqslock = { >> raw_lock = { >> { >> slock = 0, >> tickets = { >> owner = 0, >> next = 0 >> } >> } >> }, >> magic = 3735899821, >> owner_cpu = 4294967295, >> owner = 0xffffffff >> } >> }}, >> level = {0xc117f340 }, >> levelcnt = {1, 0, 0, 0, 0}, >> levelspread = "\b", >> rda = 0xc115e6b0 , >> call = 0xc01975ac , >> fqs_state = 0 '\000', >> boost = 0 '\000', >> gpnum = 5877, >> completed = 5877, >> gp_kthread = 0xf0c9e600, >> gp_wq = { >> lock = { >> { >> rlock = { >> raw_lock = { >> { >> slock = 2160230594, >> tickets = { >> owner = 32962, >> next = 32962 >> } >> } >> }, >> magic = 3735899821, >> owner_cpu = 4294967295, >> owner = 0xffffffff >> } >> } >> }, >> task_list = { >> next = 0xf0cd1f20, >> prev = 0xf0cd1f20 >> } >> }, >> gp_flags = 1, >> orphan_lock = { >> raw_lock = { >> { >> slock = 327685, >> tickets = { >> owner = 5, >> next = 5 >> } >> } >> }, >> magic = 3735899821, >> owner_cpu = 4294967295, >> owner = 0xffffffff >> }, >> orphan_nxtlist = 0x0, >> orphan_nxttail = 0xc117f490 , >> orphan_donelist = 0x0, >> orphan_donetail = 0xc117f498 , >> qlen_lazy = 0, >> qlen = 0, >> onoff_mutex = { >> count = { >> counter = 1 >> }, >> wait_lock = { >> { >> rlock = { >> raw_lock = { >> { >> slock = 811479134, >> tickets = { >> owner = 12382, >> next = 12382 >> } >> } >> }, >> magic = 3735899821, >> owner_cpu = 4294967295, >> owner = 0xffffffff >> } >> } >> }, >> wait_list = { >> next = 0xc117f4bc , >> prev = 0xc117f4bc >> }, >> owner = 0x0, >> name = 0x0, >> magic = 0xc117f4a8 >> }, >> barrier_mutex = { >> count = { >> counter = 1 >> }, >> wait_lock = { >> { >> rlock = { >> raw_lock = { >> { >> slock = 0, >> tickets = { >> owner = 0, >> next = 0 >> } >> } >> }, >> magic = 3735899821, >> owner_cpu = 4294967295, >> owner = 0xffffffff >> } >> } >> }, >> wait_list = { >> next = 0xc117f4e4 , >> prev = 0xc117f4e4 >> }, >> owner = 0x0, >> name = 0x0, >> magic = 0xc117f4d0 >> }, >> barrier_cpu_count = { >> counter = 0 >> }, >> barrier_completion = { >> done = 0, >> wait = { >> lock = { >> { >> rlock = { >> raw_lock = { >> { >> slock = 0, >> tickets = { >> owner = 0, >> next = 0 >> } >> } >> }, >> magic = 0, >> owner_cpu = 0, >> owner = 0x0 >> } >> } >> }, >> task_list = { >> next = 0x0, >> prev = 0x0 >> } >> } >> }, >> n_barrier_done = 0, >> expedited_start = { >> counter = 0 >> }, >> expedited_done = { >> counter = 0 >> }, >> expedited_wrap = { >> counter = 0 >> }, >> expedited_tryfail = { >> counter = 0 >> }, >> expedited_workdone1 = { >> counter = 0 >> }, >> expedited_workdone2 = { >> counter = 0 >> }, >> expedited_normal = { >> counter = 0 >> }, >> expedited_stoppedcpus = { >> counter = 0 >> }, >> expedited_done_tries = { >> counter = 0 >> }, >> expedited_done_lost = { >> counter = 0 >> }, >> expedited_done_exit = { >> counter = 0 >> }, >> jiffies_force_qs = 4294963917, >> n_force_qs = 4028, >> n_force_qs_lh = 0, >> n_force_qs_ngp = 0, >> gp_start = 4294963911, >> jiffies_stall = 4294966011, >> gp_max = 17, >> name = 0xc0d833ab "rcu_preempt", >> abbr = 112 'p', >> flavors = { >> next = 0xc117f2ec , >> prev = 0xc117f300 >> }, >> wakeup_work = { >> flags = 3, >> llnode = { >> next = 0x0 >> }, >> func = 0xc0195aa8 >> } >> } >> >> Hope this helps. >> >> Thanks, >> Arun >> >> >> On Tue, Dec 16, 2014 at 11:59 AM, Arun KS wrote: >> > Hello, >> > >> > I dig little deeper to understand the situation. >> > All other cpus are in idle thread already. >> > As per my understanding, for the grace period to end, at-least one of >> > the following should happen on all online cpus, >> > >> > 1. a context switch. >> > 2. user space switch. >> > 3. switch to idle thread. >> > >> > In this situation, since all the other cores are already in idle, non >> > of the above are meet on all online cores. >> > So grace period is getting extended and never finishes. Below is the >> > state of runqueue when the hang happens. >> > --------------start------------------------------------ >> > crash> runq >> > CPU 0 [OFFLINE] >> > >> > CPU 1 [OFFLINE] >> > >> > CPU 2 [OFFLINE] >> > >> > CPU 3 [OFFLINE] >> > >> > CPU 4 RUNQUEUE: c3192e40 >> > CURRENT: PID: 0 TASK: f0874440 COMMAND: "swapper/4" >> > RT PRIO_ARRAY: c3192f20 >> > [no tasks queued] >> > CFS RB_ROOT: c3192eb0 >> > [no tasks queued] >> > >> > CPU 5 RUNQUEUE: c31a0e40 >> > CURRENT: PID: 0 TASK: f0874980 COMMAND: "swapper/5" >> > RT PRIO_ARRAY: c31a0f20 >> > [no tasks queued] >> > CFS RB_ROOT: c31a0eb0 >> > [no tasks queued] >> > >> > CPU 6 RUNQUEUE: c31aee40 >> > CURRENT: PID: 0 TASK: f0874ec0 COMMAND: "swapper/6" >> > RT PRIO_ARRAY: c31aef20 >> > [no tasks queued] >> > CFS RB_ROOT: c31aeeb0 >> > [no tasks queued] >> > >> > CPU 7 RUNQUEUE: c31bce40 >> > CURRENT: PID: 0 TASK: f0875400 COMMAND: "swapper/7" >> > RT PRIO_ARRAY: c31bcf20 >> > [no tasks queued] >> > CFS RB_ROOT: c31bceb0 >> > [no tasks queued] >> > --------------end------------------------------------ >> > >> > If my understanding is correct the below patch should help, because it >> > will expedite grace periods during suspend, >> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c >> > >> > But I wonder why it was not taken to stable trees. Can we take it? >> > Appreciate your help. >> > >> > Thanks, >> > Arun >> > >> > On Mon, Dec 15, 2014 at 10:34 PM, Arun KS wrote: >> >> Hi, >> >> >> >> Here is the backtrace of the process hanging in wait_rcu_gp, >> >> >> >> PID: 247 TASK: e16e7380 CPU: 4 COMMAND: "kworker/u16:5" >> >> #0 [] (__schedule) from [] >> >> #1 [] (schedule_timeout) from [] >> >> #2 [] (wait_for_common) from [] >> >> #3 [] (wait_rcu_gp) from [] >> >> #4 [] (atomic_notifier_chain_unregister) from [] >> >> #5 [] (cpufreq_interactive_disable_sched_input) from [] >> >> #6 [] (cpufreq_governor_interactive) from [] >> >> #7 [] (__cpufreq_governor) from [] >> >> #8 [] (__cpufreq_remove_dev_finish) from [] >> >> #9 [] (cpufreq_cpu_callback) from [] >> >> #10 [] (notifier_call_chain) from [] >> >> #11 [] (__cpu_notify) from [] >> >> #12 [] (cpu_notify_nofail) from [] >> >> #13 [] (_cpu_down) from [] >> >> #14 [] (disable_nonboot_cpus) from [] >> >> #15 [] (suspend_devices_and_enter) from [] >> >> #16 [] (pm_suspend) from [] >> >> #17 [] (try_to_suspend) from [] >> >> #18 [] (process_one_work) from [] >> >> #19 [] (worker_thread) from [] >> >> #20 [] (kthread) from [] >> >> >> >> Will this patch helps here, >> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c >> >> >> >> I couldn't really understand why it got struck in synchronize_rcu(). >> >> Please give some pointers to debug this further. >> >> >> >> Below are the configs enable related to RCU. >> >> >> >> CONFIG_TREE_PREEMPT_RCU=y >> >> CONFIG_PREEMPT_RCU=y >> >> CONFIG_RCU_STALL_COMMON=y >> >> CONFIG_RCU_FANOUT=32 >> >> CONFIG_RCU_FANOUT_LEAF=16 >> >> CONFIG_RCU_FAST_NO_HZ=y >> >> CONFIG_RCU_CPU_STALL_TIMEOUT=21 >> >> CONFIG_RCU_CPU_STALL_VERBOSE=y >> >> >> >> Kernel version is 3.10.28 >> >> Architecture is ARM >> >> >> >> Thanks, >> >> Arun >> >