From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751429AbaLQT2D (ORCPT ); Wed, 17 Dec 2014 14:28:03 -0500 Received: from e35.co.us.ibm.com ([32.97.110.153]:56732 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751101AbaLQT2A (ORCPT ); Wed, 17 Dec 2014 14:28:00 -0500 Date: Wed, 17 Dec 2014 11:27:53 -0800 From: "Paul E. McKenney" To: Arun KS Cc: "linux-kernel@vger.kernel.org" , josh@joshtriplett.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, laijs@cn.fujitsu.com Subject: Re: [RCU] kernel hangs in wait_rcu_gp during suspend path Message-ID: <20141217192753.GS5310@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14121719-0013-0000-0000-00000727B25C Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 16, 2014 at 11:59:07AM +0530, Arun KS wrote: > Hello, > > I dig little deeper to understand the situation. > All other cpus are in idle thread already. > As per my understanding, for the grace period to end, at-least one of > the following should happen on all online cpus, > > 1. a context switch. > 2. user space switch. > 3. switch to idle thread. This is the case for rcu_sched, and the other flavors vary a bit. > In this situation, since all the other cores are already in idle, non > of the above are meet on all online cores. > So grace period is getting extended and never finishes. Below is the > state of runqueue when the hang happens. > --------------start------------------------------------ > crash> runq > CPU 0 [OFFLINE] > > CPU 1 [OFFLINE] > > CPU 2 [OFFLINE] > > CPU 3 [OFFLINE] > > CPU 4 RUNQUEUE: c3192e40 > CURRENT: PID: 0 TASK: f0874440 COMMAND: "swapper/4" > RT PRIO_ARRAY: c3192f20 > [no tasks queued] > CFS RB_ROOT: c3192eb0 > [no tasks queued] > > CPU 5 RUNQUEUE: c31a0e40 > CURRENT: PID: 0 TASK: f0874980 COMMAND: "swapper/5" > RT PRIO_ARRAY: c31a0f20 > [no tasks queued] > CFS RB_ROOT: c31a0eb0 > [no tasks queued] > > CPU 6 RUNQUEUE: c31aee40 > CURRENT: PID: 0 TASK: f0874ec0 COMMAND: "swapper/6" > RT PRIO_ARRAY: c31aef20 > [no tasks queued] > CFS RB_ROOT: c31aeeb0 > [no tasks queued] > > CPU 7 RUNQUEUE: c31bce40 > CURRENT: PID: 0 TASK: f0875400 COMMAND: "swapper/7" > RT PRIO_ARRAY: c31bcf20 > [no tasks queued] > CFS RB_ROOT: c31bceb0 > [no tasks queued] > --------------end------------------------------------ > > If my understanding is correct the below patch should help, because it > will expedite grace periods during suspend, > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c I believe that we already covered this, but I do suggest that you give it a try. > But I wonder why it was not taken to stable trees. Can we take it? > Appreciate your help. I have no objection to your taking it, but have you tried it yet? Thanx, Paul > Thanks, > Arun > > On Mon, Dec 15, 2014 at 10:34 PM, Arun KS wrote: > > Hi, > > > > Here is the backtrace of the process hanging in wait_rcu_gp, > > > > PID: 247 TASK: e16e7380 CPU: 4 COMMAND: "kworker/u16:5" > > #0 [] (__schedule) from [] > > #1 [] (schedule_timeout) from [] > > #2 [] (wait_for_common) from [] > > #3 [] (wait_rcu_gp) from [] > > #4 [] (atomic_notifier_chain_unregister) from [] > > #5 [] (cpufreq_interactive_disable_sched_input) from [] > > #6 [] (cpufreq_governor_interactive) from [] > > #7 [] (__cpufreq_governor) from [] > > #8 [] (__cpufreq_remove_dev_finish) from [] > > #9 [] (cpufreq_cpu_callback) from [] > > #10 [] (notifier_call_chain) from [] > > #11 [] (__cpu_notify) from [] > > #12 [] (cpu_notify_nofail) from [] > > #13 [] (_cpu_down) from [] > > #14 [] (disable_nonboot_cpus) from [] > > #15 [] (suspend_devices_and_enter) from [] > > #16 [] (pm_suspend) from [] > > #17 [] (try_to_suspend) from [] > > #18 [] (process_one_work) from [] > > #19 [] (worker_thread) from [] > > #20 [] (kthread) from [] > > > > Will this patch helps here, > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c > > > > I couldn't really understand why it got struck in synchronize_rcu(). > > Please give some pointers to debug this further. > > > > Below are the configs enable related to RCU. > > > > CONFIG_TREE_PREEMPT_RCU=y > > CONFIG_PREEMPT_RCU=y > > CONFIG_RCU_STALL_COMMON=y > > CONFIG_RCU_FANOUT=32 > > CONFIG_RCU_FANOUT_LEAF=16 > > CONFIG_RCU_FAST_NO_HZ=y > > CONFIG_RCU_CPU_STALL_TIMEOUT=21 > > CONFIG_RCU_CPU_STALL_VERBOSE=y > > > > Kernel version is 3.10.28 > > Architecture is ARM > > > > Thanks, > > Arun >