linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Qian Cai <cai@lca.pw>
Cc: rcu@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: Boot warning at rcu_check_gp_start_stall()
Date: Wed, 22 Jan 2020 08:50:14 -0800	[thread overview]
Message-ID: <20200122165014.GG2935@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <FD18F879-048F-40E9-B04D-3C189C59181B@lca.pw>

On Wed, Jan 22, 2020 at 12:20:02AM -0500, Qian Cai wrote:
> 
> 
> > On Jan 22, 2020, at 12:07 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Tue, Jan 21, 2020 at 11:16:06PM -0500, Qian Cai wrote:
> >> 
> >> 
> >>> On Jan 21, 2020, at 3:46 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> >>> 
> >>> On Tue, Jan 21, 2020 at 02:09:05PM -0500, Qian Cai wrote:
> >>>>> On Jan 21, 2020, at 11:15 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>>> On Tue, Jan 21, 2020 at 09:37:13AM -0500, Qian Cai wrote:
> >>>>>>> On Jan 21, 2020, at 9:19 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> >>>>>>> 
> >>>>>>> One approach would be to boot with rcupdate.rcu_cpu_stall_timeout=300,
> >>>>>>> which would allow more time.
> >>>>>> 
> >>>>>> It works for me if once that warning triggered,  give a bit information about adjusting the parameter when debugging options are on to suppress the warning due to expected long boot.
> >>>>> 
> >>>>> Indeed.  300 seconds as shown above is currently the maximum, but
> >>>>> please let me know if it needs to be increased.  This module parameter
> >>>>> is writable after boot via sysfs, so maybe that could be part of the
> >>>>> workaround.
> >>>>> 
> >>>>>>> Longer term, I could suppress this warning during boot when
> >>>>>>> CONFIG_EFI_PGT_DUMP=y, but that sounds quite specific.  Alternatively,
> >>>>>>> I could provide a Kconfig option that suppressed this during boot
> >>>>>>> that was selected by whatever long-running boot-time Kconfig option
> >>>>>>> needed it.  Yet another approach would be for long-running operations
> >>>>>>> like efi_dump_pagetable() to suppress stalls on entry and re-enable them
> >>>>>>> upon exit.
> >>>>>>> 
> >>>>>>> Thoughts?
> >>>>>> 
> >>>>>> None of the options sounds particularly better for me because there could come up with other options may trigger this, memtest comes in mind, for example. Then, it is a bit of pain to maintain of unknown.
> >>>>> 
> >>>>> I was afraid of that.  ;-)
> >>>>> 
> >>>>> Could you please send me the full dmesg up to that point?  No promises,
> >>>>> but it might well be that I can make some broad-spectrum adjustment
> >>>>> within RCU.  Only one way to find out…
> >>>> 
> >>>> https://cailca.github.io/files/dmesg.txt
> >>> 
> >>> Interesting.
> >>> 
> >>> Does the following (very lightly tested) patch help?
> >> 
> >> Yes, it works fine.
> > 
> > Very good, thank you!  May I apply your Tested-by?
> 
> Just one thing minor,
> 
> Applying: Boot warning at rcu_check_gp_start_stall()
> .git/rebase-apply/patch:66: space before tab in indent.
> 	    		 READ_ONCE(rnp_root->gp_seq_needed)) ||
> warning: 1 line adds whitespace errors.

Good catch, fixed!

> Otherwise,
> 
> Tested-by: Qian Cai <cai@lca.pw>

Applied, thank you!

							Thanx, Paul

> >>> ------------------------------------------------------------------------
> >>> 
> >>> commit fb21277f8f1c5cc40a8d41da2db4b0c499459821
> >>> Author: Paul E. McKenney <paulmck@kernel.org>
> >>> Date:   Tue Jan 21 12:30:22 2020 -0800
> >>> 
> >>>   rcu: Don't flag non-starting GPs before GP kthread is running
> >>> 
> >>>   Currently rcu_check_gp_start_stall() complains if a grace period takes
> >>>   too long to start, where "too long" is roughly one RCU CPU stall-warning
> >>>   interval.  This has worked well, but there are some debugging Kconfig
> >>>   options (such as CONFIG_EFI_PGT_DUMP=y) that can make booting take a
> >>>   very long time, so much so that the stall-warning interval has expired
> >>>   before RCU's grace-period kthread has even been spawned.
> >>> 
> >>>   This commit therefore resets the rcu_state.gp_req_activity and
> >>>   rcu_state.gp_activity timestamps just before the grace-period kthread
> >>>   is spawned, and modifies the checks and adds ordering to ensure that
> >>>   if rcu_check_gp_start_stall() sees that the grace-period kthread
> >>>   has been spawned, that it will also see the resets applied to the
> >>>   rcu_state.gp_req_activity and rcu_state.gp_activity timestamps.
> >>> 
> >>>   Reported-by: Qian Cai <cai@lca.pw>
> >>>   Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >>> 
> >>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >>> index 04718bc..d9d619d 100644
> >>> --- a/kernel/rcu/tree.c
> >>> +++ b/kernel/rcu/tree.c
> >>> @@ -1209,7 +1209,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> >>> 	trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("Startedroot"));
> >>> 	WRITE_ONCE(rcu_state.gp_flags, rcu_state.gp_flags | RCU_GP_FLAG_INIT);
> >>> 	WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> >>> -	if (!rcu_state.gp_kthread) {
> >>> +	if (!READ_ONCE(rcu_state.gp_kthread)) {
> >>> 		trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> >>> 		goto unlock_out;
> >>> 	}
> >>> @@ -1259,10 +1259,10 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
> >>> */
> >>> static void rcu_gp_kthread_wake(void)
> >>> {
> >>> -	if ((current == rcu_state.gp_kthread &&
> >>> +	if ((current == READ_ONCE(rcu_state.gp_kthread) &&
> >>> 	     !in_irq() && !in_serving_softirq()) ||
> >>> 	    !READ_ONCE(rcu_state.gp_flags) ||
> >>> -	    !rcu_state.gp_kthread)
> >>> +	    !READ_ONCE(rcu_state.gp_kthread))
> >>> 		return;
> >>> 	WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
> >>> 	WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
> >>> @@ -3619,7 +3619,10 @@ static int __init rcu_spawn_gp_kthread(void)
> >>> 	}
> >>> 	rnp = rcu_get_root();
> >>> 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >>> -	rcu_state.gp_kthread = t;
> >>> +	WRITE_ONCE(rcu_state.gp_activity, jiffies);
> >>> +	WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> >>> +	// Reset .gp_activity and .gp_req_activity before setting .gp_kthread.
> >>> +	smp_store_release(&rcu_state.gp_kthread, t);  /* ^^^ */
> >>> 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >>> 	wake_up_process(t);
> >>> 	rcu_spawn_nocb_kthreads();
> >>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> >>> index 476458c..75f6e9f 100644
> >>> --- a/kernel/rcu/tree_stall.h
> >>> +++ b/kernel/rcu/tree_stall.h
> >>> @@ -578,6 +578,7 @@ void show_rcu_gp_kthreads(void)
> >>> 	unsigned long jw;
> >>> 	struct rcu_data *rdp;
> >>> 	struct rcu_node *rnp;
> >>> +	struct task_struct *t = READ_ONCE(rcu_state.gp_kthread);
> >>> 
> >>> 	j = jiffies;
> >>> 	ja = j - READ_ONCE(rcu_state.gp_activity);
> >>> @@ -585,8 +586,7 @@ void show_rcu_gp_kthreads(void)
> >>> 	jw = j - READ_ONCE(rcu_state.gp_wake_time);
> >>> 	pr_info("%s: wait state: %s(%d) ->state: %#lx delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
> >>> 		rcu_state.name, gp_state_getname(rcu_state.gp_state),
> >>> -		rcu_state.gp_state,
> >>> -		rcu_state.gp_kthread ? rcu_state.gp_kthread->state : 0x1ffffL,
> >>> +		rcu_state.gp_state, t ? t->state : 0x1ffffL,
> >>> 		ja, jr, jw, (long)READ_ONCE(rcu_state.gp_wake_seq),
> >>> 		(long)READ_ONCE(rcu_state.gp_seq),
> >>> 		(long)READ_ONCE(rcu_get_root()->gp_seq_needed),
> >>> @@ -633,7 +633,8 @@ static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
> >>> 
> >>> 	if (!IS_ENABLED(CONFIG_PROVE_RCU) || rcu_gp_in_progress() ||
> >>> 	    ULONG_CMP_GE(READ_ONCE(rnp_root->gp_seq),
> >>> -	    		 READ_ONCE(rnp_root->gp_seq_needed)))
> >>> +	    		 READ_ONCE(rnp_root->gp_seq_needed)) ||
> >>> +	    !smp_load_acquire(&rcu_state.gp_kthread))
> >>> 		return;
> >>> 	j = jiffies; /* Expensive access, and in common case don't get here. */
> >>> 	if (time_before(j, READ_ONCE(rcu_state.gp_req_activity) + gpssdelay) ||
> 

      reply	other threads:[~2020-01-22 16:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-21  4:25 Boot warning at rcu_check_gp_start_stall() Qian Cai
2020-01-21  5:09 ` Paul E. McKenney
2020-01-21 11:45   ` Qian Cai
2020-01-21 14:19     ` Paul E. McKenney
2020-01-21 14:37       ` Qian Cai
2020-01-21 16:15         ` Paul E. McKenney
2020-01-21 19:09           ` Qian Cai
2020-01-21 20:46             ` Paul E. McKenney
2020-01-22  4:16               ` Qian Cai
2020-01-22  5:07                 ` Paul E. McKenney
2020-01-22  5:20                   ` Qian Cai
2020-01-22 16:50                     ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200122165014.GG2935@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=cai@lca.pw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rcu@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).